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SUPERCOMPUTER IMPLEMENTATION OF FINITE ELEMENT ALGORITHMS 
FOR HIGH SPEED COMPRESSIBLE FLOWS 

By 

Earl A. Thornton 1 and R. Ramakrishnan 2 
ABSTRACT 

Prediction of compressible flow phenomena using the finite element 
method is of recent origin and considerable interest. Two shock capturing 
finite element formulations for high speed compressible flows are described. 
A Taylor-Galerkin formulation uses a Taylor series expansion in time coupled 
with a Galerkin weighted residual statement. The Taylor-Galerkin algorithms 
uses explicit artificial dissipation, and the performance of three dissi- 
pation models are compared. A Petrov-Galerkin algorithm has as its basis 
the concepts of streamline upwinding. Vectorization strategies are develop- 
ed to implement the finite element formulations on the NASA Langley VPS-32. 
The vectorization scheme results in finite element programs that use vectors 
of length of the order of the number of nodes or elements. The use of the 
vectorization procedure speeds up processing rates by over two orders of 
magnitude. The Taylor-Galerkin and Petrov-Galerkin algorithms are evaluated 
for 2D inviscid flows on criteria such as solution accuracy, shock resolu- 
tion, computational speed and storage requirements. The convergence rates 
for both algorithms are enhanced by local time-stepping schemes. Extension 
of the vectorization procedure for predicting 2D viscous and 3D inviscid 
flows are demonstrated. Conclusions are drawn regarding the applicability 
of the finite element procedures for realistic problems that require hun- 
dreds of thousands of nodes. 

iProfessor, Department of Mechanical Engineering, Old Dominion University, 
Norfolk, Virginia 23508. 

2 Graduate Research Assistant, Department of Mechanical Engineering, Old 
Dominion University, Norfolk, Virginia 23508. 
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Chapter 1 
INTRODUCTION 


Until the late 19th century scientific analyses were non-existent 
in engineering practice and engineers relied heavily on empirical rela- 
tions. The fields of engineering and mathematics were considered to be 
totally unrelated. The great mathematician Hilbert was said to have 
pronounced, "The mathematician and the engineer have nothing to do with 
each other and never will." The historic flight of the Wright brothers 
in 1903 shattered this notion and ushered in the era of scientific 
technology wherein research and its practical applications proceeded in 
parallel. The aerodynamicist of today is very much aware of the strong 
interplay between mathematics, engineering, and numerical analysis in 
the prediction of flow behavior around flight vehicles. 

A system of nonlinear equations of special interest to aerody- 
namicists are the compressible flow equations. The compressible in- 
viscid equations or the Euler equations describe flow of a friction- 
less, non-heat conducting fluid. The addition of viscosity and heat 
dissipation to the inviscid equations results in the compressible, vis- 
cous Navier-Stokes equations. Numerical studies aimed at predicting 
the flow features these equations describe have mushroomed due to the 
availability of bigger and better "number crunchers." A new field 
known as Computational Fluid Dynamics (CFD) has evolved which is 
devoted to the numerical simulation of fluid dynamic equations. 
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The role of CFD in fluid dynamics and aerodynamics has seen a dra- 
matic increase in the last few years. A new breed of computational 
fluid dynamicists are daring to dream of a future free from the tyranny 
of wind tunnels. The rapid advancements in computer technology with 
the proliferation of supercomputers and the evolution of ultra -com- 
puters indicate that those dreams are fast assuming proportions of 
reality. The role of CFD has also received a boost with recent initia- 
tives to develop the hypersonic research airplane, the "Orient 
Express." Numerical simulations will have a major role in the design 
and development of the hypersonic research aircraft as well as in the 
design of transatmospheric vehicles (TAVs) that form part of the con- 
troversial SDI or "Star Wars" concept. 

Supersonic flight vehicles such as the Anglo-French "Concorde" 
cruise at speeds exceeding Mach 2. The proposed hypersonic research 
aircraft is envisaged to have applications in the range of Mach 6 to 
Mach 25. At these extremely high speeds, the aerodynamic heating on 
vehicle surfaces have substantial effects on flight performance. 
Deformations and stresses that result from heating effects are signifi- 
cant and means of predicting these effects .are clearly necessary. 
Detailing deformations and stresses due to aerodynamic heating requires 
integrated fluid-thermal -structural analyses. The accurate prediction 
of high speed compressible flow features described by the Navier-Stokes 
equations forms the backbone of such integrated analyses. 
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1.1 Background 

The use of computers to predict flow features has become an 
important and indispensable part of understanding flow behavior. Use 
of digital computers to solve problems of fluid dynamics started in the 
early forties of this century [1]* and has continued ever since. Until 
the early seventies the method of finite differences enjoyed exclusive 
use in efforts aimed at predicting flow features. It was only in the 
early seventies that researchers began to consider the use of finite 
element methods for flow analyses. Early applications of finite ele- 
ment methods to flow problems were in the incompressible flow domain 
[2, 3]. During the last three years researchers have developed the 
first finite element formulations for prediction of high speed compres- 
sible inviscid and viscous flows. 

A tour d' horizon of research efforts for incompressible and com- 
pressible flow simulations using the finite element method appears in 
[4, 5]. Most of the literature in finite element compressible flows 
deal with transonic flows. Potential flow formulations have been 
developed by Ecer and Akay [6] and extensions for the Euler equations 
in non -conservative form are presented in [7]. The use of implicit and 
explicit procedures to solve inviscid transonic problems about airfoils 
and engine inlets is detailed by Ararand, et al. [8, 9]. A Galerkin 
finite element formulation was recently used by Jameson [10] to study 
the flow features about an entire Boeing 747 aircraft. Two families of 
finite element formulations for high speed compressible flow analyses 
have evolved: the Taylor-Galerkin [11, 13] and the Petrov-Galerkin 
[14, 16] formulations. 


♦Numbers in brackets indicate references 
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A question that often arises, especially from devotees of the 
finite difference method, is why use finite element methods for com- 
pressible flow computations?' Admittedly, finite difference methods 
have reached a high level of sophistication, but some capabilities of 
the finite element method suggest that the method deserves investiga- 
tion. The need for integrated fluid-thermal -structural analyses was 
mentioned earlier, and the finite element method lends itself well to 
such an approach. The capability to model complex flow domains with 
relative ease is also one of its major selling points. Compressible 
flow situations are often characterized by regions that need to be 
adaptively refined. The geometric flexibility of finite elements makes 
it highly amenable to mesh refinement procedures. Factors such as 
these have infused new interest in the application of finite element 
methods to compressible flow problems. 

A research effort is underway at the NASA Langley Research Center, 
in concert with industry and university researchers, to improve the 
capability and efficiency of finite element methods for high speed com- 
pressible flows and to develop more efficient integration of finite 
element fluid, thermal and structural analyses. The culmination of 
these research efforts will be the ability to predict accurately aero- 
thermal loads for complex three dimensional bodies. 

1.2 Purpose 

The need for integrated fluid-thermal -structural analyses and the 
motivations for choosing the finite element method for such analyses 
appear in earlier sections. Finite element formulations suitable for 
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compressible flow calculations are of recent origin. The Taylor- 
Galerkin and Petrov-Galerkin formulations continue to evolve being 
modified to enhance their capabilities for accurate solutions. A 
Flux-Corrected-Transport (FCT) version of the Taylor-Galerkin was 
developed recently [17], and modifications of the Petrov-Galerkin 
formulations continue [18]. 

The procedure to simulate the characteristics of compressible 
fluid flow involves, in addition to the finite element algorithm, 
issues such as fast-and-easy model generation, efficient programming 
strategies to implement the algorithms, and results display using color 
graphics. In this study the generation of the flow model and the 
results display are done using the commercially available software 
package, PATRAN [19]. 

The use of efficient programming strategies is of major importance 
to predict flow behavior around complex 3D 'configurations. The purpose 
of this study is to develop computational procedures for implementing 
finite element formulations for compressible flows on a supercomputer. 
The highly vectorized computational procedures can be used to analyze 
compressible inviscid and viscous flows and can be extended to develop 
integrated fluid-thermal -structural analyses. 

The basic concepts of shock capturing methods which are based on 
the theory of weak solutions are introduced in Chap. 2. Two shock 

capturing finite element formulations are then introduced. The 
Taylor-Galerkin formulation uses explicit artificial dissipation and 
three popular dissipation models are explored. The Petrov-Galerkin 
formulation, being based on the concepts of upwinding, needs no 
explicit artificial dissipation. Chapter 3 discusses the need for 
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vectori zati on strategies and develops procedures to vectorize the 
finite element algorithms effectively on the NASA Langley VPS-32. The 
effects of local time-stepping procedures and the use of different 
explicit dissipation models for the Taylor-Galerkin finite element for- 
mulation are illustrated in Chap. 4. The Taylor-Galerkin and Petrov- 
Galerkin formulations are evaluated for a variety of 2D inviscid prob- 
lems. Evaluation criteria include solution accuracy, computational 
speed, and storage requirements. Extension of the vectorization proce- 
dure to 2D viscous flows using the Petrov-Galerkin formulation appears 
in Chap. 5. The finite element methodology for 3D inviscid compres- 
sible flow computations using the Taylor-Galerkin formulation is 
described in Chap. 6. Finally, conclusions are drawn regarding the 
performance of the two vectorized finite element formulations for in- 
viscid and viscous flows, and recommendations are made for further 


research. 



Chapter 2 

SHOCK CAPTURING FINITE ELEMENT ALGORITHMS 


Compressible flow problems involve flow situations that may con- 
tain shocks, contact surfaces and expansions. Shocks are difficult to 
model being characterized by abrupt changes in all variables across a 
very thin region. In reality, the shock occurs across spatial dimen- 
sions of the order of microns, but due to computational limitations, 
shocks are modelled as spread over a few grid points. A numerical 
scheme is rated according to how well it captures the shock - the 
crisper the shock, the better the method. 

Prediction of shock location and strength can be achieved by 
either "shock fitting" or "shock capturing." Shock fitting methods use 
the Rankine-Hugoniot relations to fit the shock relative to the flow 
field [20]. Shock capturing methods, on the other hand, predict shocks 
and other discontinuities as part of the solution. Though the pre- 
dicted shocks are smeared a bit, the generality of the concept makes 
the method attractive. 


2.1 Theory of Weak Solutions 

The principles of shock capturing are based on the theory of weak 
solutions of hyperbolic equations. Consider the hyperbolic system of 


6 
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equations given by, 

U, t + F, x = 0 (2.1) 

where U is a vector of unknowns and is a. function of x and t. F is a 
vector function of U, x and t. The above equation may be written in 
coefficient form as, 

U, t + A U, x » 0 (2.2) 

where A, the coefficient matrix, is given by, 

A = F i[} (2.3) 

Since Eq. (2.1) is a hyperbolic system of equations, the eigenvalues of 
matrix A are all real. 

Nonlinear hyperbolic partial differential equations exhibit two 
types of solutions, weak and genuine solutions. A genuine solution of 
the above equations occurs when U is continuous over the domain but 
derivatives of U may be discontinuous. A weak solution, on the other 
hand, occurs when U is continuous in the domain, except along a line or 
surface where U may be discontinuous. The presence of shocks in super- 
sonic flows is an indication of the presence of weak solutions for the 
hyperbolic equations. Let the solution U(x,t) of Eq. (2.1) be 
subjected to the initial data, 

U ( x , o ) = $(x) (2.4) 

Let w(x,t) be a test function that satisfies the integral version of 
Eq. (2.1), 

JJ (w, t U + w, x F) dxdt + J w(x,0) <$>(*) dx = 0 (2.5) 

which is obtained by multiplying Eq. (2.1) by w(x,t) and integrating by 
parts [21]. If U is a weak solution it can be shown that across a line 
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of discontinuity, 

F(U r ) - F(U l ) = S(U R - U L ) (2.6) 

where S is the speed of propagation of the discontinuity and U[_ and 
Ur are the states to the left and right of the discontinuity, 
respectively. For the Euler equations these relations are the shock 
relations or the "Rankine-Hugoniot" relations. 

Weak solutions exhibit nonuniqueness for an initial value 

problem. For instance, the solution of the Burgers equation. 


u, t + f, x = 0 (2.7) 

can be either of two weak solutions. For Eq. (2.7) with initial 
conditions of 


u(x,0) 

two solutions are possible: 

u(x,t) 


and 



x < 0 
x > 0 

x < 0 
0 < x < t 
t < x 


( 2 . 8 ) 


(2.9) 


u(x,t) 



2 x < t 
2 x > t 


( 2 . 10 ) 


This implies that the initial value problem for weak solutions is not a 
meaningful one. If the mathematical model is to reflect physically 
relevant solutions, an additional principle is needed to select a 
unique weak solution. One principle that has found wide acceptance is: 
"Weak solutions that occur in nature are limits of viscous flow." This 
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statement forms the basis of the artificial viscosity methods. Adding 
a diffusion term to Eq. (2.1) results in a nonlinear parabolic system 
gi ven by , 

U ’t + F, x = e U, xx (2.11) 
where e is the dissipation coefficient. The initial value problem can 
be solved for a wide variety of initial conditions, and it can be shown 
that as e becomes smaller the corresponding solutions converge 
boundedly to the "right" solution for Eq. (2.1). The addition of this 
viscosity term does not exactly correspond to the addition of viscosity 
or heat conduction, but does produce solutions that "make sense." The 
use of artificial viscosity in simulating compressible flow is common 
for finite difference methods. The pioneering work in this area was 
done by Von Neumann and Richtmyer [22] who introduced an artificial 
dissipation term into the equations so as to give shocks a thickness 
comparable to the grid spacing. The popular finite difference methods 
such as those of MacCormack [23], Beam and Warming [24], and Burstein 
[25] use the concept of artificial viscosity in combination with second 
order difference schemes. 

A number of schemes that are currently popular are those based on 
upwind differencing. These schemes use the concepts of characteristic 
theory and wave propagation. They are physically consistent and pro- 
duce sharp shocks without explicit artificial viscosity. Schemes such 
as those of Steger and Warming [26], Roe [27], and van Leer [28], are 
upwind schemes that contain internal dissipation due to the one-sided 
differencing. Upwind schemes can be shown to be equivalent to central 
difference schemes with added dissipation [29]. 
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The Taylor-Galerkin formulation which uses the concepts of explicit 
artificial viscosity, and the Petrov-Galerkin method which is based on 
the concepts of upwinding are shock -capturing finite element 
formulations. 


2.2 Euler Equations 


The conservation equations can be written in two dimensions as. 


U, t+ Fi i -0 (2.12) 

where U is a vector of conservation variables, and Fj are flux com- 
ponents of the mass, momentum and energy in the coordinate directions. 
The index i denotes the component in the i -th direction, and a comma 
denotes partial differentiation. Repeated indices indicate summation 
over the range of i. The vector U and the flux vectors Fj are given 


by. 


U = p 


Fj = u.U + p J 


'11 

2i 


(2.13) 


where p is the density, Uj are the velocity components in the 
coordinate directions, and Ej; is the total energy. The Kronecker 
delta 6jj is defined as. 



0 i*j 

1 i=j 


(2.14) 


and the pressure p as. 


p = (y-1) p[E t - 0.5(u. U j)] (2.15) 

Equation (2.12) is solved subject to proper initial and boundary 
conditions. 
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2.3 Taylor-Galerkin Algorithm 

The Taylor-Galerkin algorithm was first proposed by Donea for the 
advection equation [30]. The concept was then applied to the inviscid 
Euler equations by Lohner, Morgan and Zienkiewicz [11, 12]. One step 
Taylor-Galerkin and two step Taylor-Galerkin formulations [13] have 
been applied to high speed inviscid equations. 

2.3.1 Finite Element Formulation 

The Taylor-Galerkin formulation is easier to derive by considering 
just one variable. For a scalar variable u the typical equation is, 

u, t + F iji = 0 (2.16) 

where u, F-j are analogus to the corresponding vector quantities in 
Eq. (2.12). The Taylor-Gal erkin formulation used in this dissertation 
is a two step method with element quantities being calculated at the 
first step and nodal quantities at the second step. A detailed 
derivation of the algorithm is presented in [4]; for brevity, only the 
key equations are presented herein. 

Time level t n+ i/2 : 

The constant element value 1S computed from, 

. n+1/2 = f [N]dA{u} n -^ f [N,.] dA {F.} n (2.17) 

A u d J A c -»A 1 

where A denotes an element area. At is the time step, and [N] is a row 

matrix of element interpolation functions. On the outflow plane the 

element side quantities are computed from. 
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L u 


n+1/2 


■k 


[N $ ] ds 


{u s }' 


At 

T 


k 


[H s ] ds (F f>1 ) 


(2.18) 


In the above [N s ] denotes the interpolation function of the flux 
components on the outflow surfaces, and L is the outflow surface 
length. The interpolation of the flux quantities on the boundary 

differs from that on the interior. Other variations of calculating the 
outflow terms, and the rationale behind those calculations appear in 
[4]. The quantities obtained at the half step are used to calculate 
the nodal quantities at the second step, t=t n+ i* 


Time level t n+ i: 

An approximation to the Taylor series expansion of u at tp+i /2 
and the application of the weighted residual statement on the resulting 
equations yield, 

CM] { 6 u} n+1 = At f [N,.] dA F? +1/2 + {R} n+1/2 (2.19) 

Ja 1 1 


where [M] is the element consistent mass matrix given by. 


[M] = f [N] T [N] dA 


( 2 . 20 ) 


and the load vector { R> is given by. 


{R} n+1/2 = _ At f i f"T 1/2 {N} ds 
J [_ 1 s 1 


( 2 . 21 ) 


In Eq. (2.21) If are the components of the unit normal surface vector 
n. The flux components on the surface F S1 - are computed using the 
surface quantities obtained at the half step from Eq. (2.18). 

The element integrals that appear in Eqs. (2.17M2.21) can be 
evaluated in closed form to avoid numerical integration that is 
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common in many finite element formulations. Numerical integration, 
typically Gauss quadrature is expensive and for three dimensions 
increases computer storage requirements appreciably. The element inte- 
grals that appear in Eqs. (2.17M2.21) are evaluated just once, out- 
side the time-step loop, making the time-marching scheme very effi- 
cient. Reference 31 explains the Taylor-Galerkin formulation used and 
details the explicit element integral evaluations used in the implemen- 
tation of the Taylor-Galerkin algorithm. 

2.3.2 Artificial Dissipation Models 

To guarantee physically consistent results and to stabilize the 
computations, artificial dissipation is added at the end of each time- 
step. The a-posteriori smoothing is given by the relation, 

u£ +1 = u n+1 + D(u n+1 ) (2.22) 

where D(u) is a diffusion operator that depends on the artificial vis- 
cosity model employed. A variety of artificial dissipation models 
exist, and three popular models are investigated for use with the 
Taylor-Galerkin algorithm. 

Artificial dissipation models that have found wide use in CFD 
literature include the dissipation models due to Lapidus [32], 
MacCormack and Baldwin [33], and the blended higher order differencing 
scheme due to Jameson [34]. 

2. 3. 2.1 Lapidus dissipation model : The dissipation operator due to 
Lapidus [32] is second order and uses velocity gradients as dissipation 
coefficients. The addition of artificial dissipation is based on the 



14 


relation. 


u|! +1 . „" +1 + L (u" +1 ) (2.23) 

where u = u(xj,t) is the scalar unknown in Eq. (2.16) and L(u) is the 
dissipation operator, the dissipation being added at the end of each 
time-step. The dissipation operator L(u) is defined as. 


where. 


L(u) 



(2.24) 


E. = kj u,^ (i not summed) (2.25) 

The coefficients or pointers, k-f, indicate where and how much dissi- 
pation is added in specific regions of the flowfield. The coefficients 
kj are given by. 


k.j s v At A | | (i not summed) (2.26) 

where u-f are the velocity components in the coordinate directions, A 
is the element area, v is the lapidus constant, and At is the time- 
step. 

The method of weighted residuals applied to Eq. (2.23) results in 
the element equation, 

f [N] T u " +1 dA [N] T u n+1 dA +[ [N] T E. . dA (2.27) 

J A s J A -»A 1,1 


Using the Green-Gauss theorem for the second order terms, Eq. (2.27) 
reduces to. 


f Cn] t 

J A 


au " +1 dA = -/ E • [N, .] dA + / [N] E. . ds 


i l 


k 




(2.28) 


where 


AU $ = u" +1 - U P+1 


=■ [N] {Au s } n+1 


(2.29) 
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The second term in Eq. (2.28) is the boundary term that results due to 
the integration by parts. The boundary term vanishes at the outflow 
surfaces, and Eq. (2.28) can be written as, 

CM] (Au } n+1 = -f E. [N, .] dA (2.30) 

s J a 1 1 

where [M] is the element mass matrix that appears in Eq. (2.20). 


2. 3. 2. 2 MacCormack -Baldwin dissipation model : The MacCormack -Baldwin 

dissipation model [33] uses the second derivative of pressure as a 
pointer for the amount of artificial dissipation to be added. The 
addition of artificial diffusion is given by the relation. 


u n+l = u n+l + MB ( u n+ l) (2.31) 

where the operator MB(u), the MacCormack -Bal dwin dissipation operator 
is similar in form to the Lapidus dissipation operator and is defined 


as. 


MB(u) = E. . 

' 9 * 


(2.32) 


The coefficients kj are given by, 


k i = ^ P ” fi 


(i not summed) 


(2.33) 


The use of the method of weighted residuals and the Green-Gauss 
theorem results in the element equation similar to Eq. (2.30) as. 


[M] Uu s ) n+1 = -J E. [N,.] dA 


(2.34) 


where coefficients k-j are given by Eq. (2.33). 

The calculation of the coefficients k-j involve obtaining the 
second derivatives of the pressure with respect to the coordinate 
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directions. The finite element formulation detailed in the earlier 
sections assumes a bi -linear variation of the conservation variables 
within an element. If the pressure is interpolated similarly the 
second derivatives of the pressure vanish. One way to circumvent this 
problem is to make use of the Green's formula shown in Appendix A. 
Appendix A shows that the second derivatives of the pressure at the 
nodes of an element can be obtained from the equation, 

CM] j '241- -f CM.] dA p y (2.35) 

Lax^i Ja x ,x 

where, p, x , the first derivative of pressure, is computed at the 
Gauss points. The assemblage of the element quantities Eq. (2.35) and 
solution of the global equations yields the second derivatives of 
pressure at the nodes. The procedure for obtaining the second 
derivatives with respect to the other coordinate direction is similar. 

2. 3. 2. 3 Jameson dissipation model : The dissipation model due to 

Jameson [34] is a blend of second and fourth derivatives with the 
coefficients depending on the pressure gradients. The second and 
fourth derivatives are adaptively blended such that at shocks the 
second derivatives come into play, while in smooth regions the fourth 
derivative terms are used. The smoothing is done according to the 
equation, 

u n+l = u n+l + J(u n+1 ) (2.36) 

where J(u), the blended smoothing operator, is defined as. 



(2.37) 
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The dissipative fluxes Ej are given by. 


A 


4 


k i u ’i 


k i u *iii 


(i not summed) 


(i not summed) (2.38) 


The coefficients kj and k? are functions of the local 
gradients and can be written as, 

k ! * 

k ? * A *«1 


pressure 


where 


«} • e< 2 >A 1 


3p | P M1 


5? = max (0,e^ - 5 j) 


(2.39) 


and e(2) and e(4) are constants. Equations (2.37M2.39) also 
indicate that when e^ 4 ^ is zero, the fourth difference terms vanish 
resulting in the MacCormack -Baldwin dissipation operator. 

The use of the weighted residual formulation and the Green-Gauss 
theorem yields the finite element equations, 

CM] {au l n+1 = - f e} [N .] dA + f E? [N .] dA 

s J A 1 -J A 1 (2.40) 

In addition to the second derivatives of pressure, the Jameson 
dissipation model requires the computation of the third derivatives of 
the conservation variables. The second derivatives of the conservation 
variables are obtained at the nodes of an element as, 

(i not summed) 


r 

CM] (U = - [N ,] [N] dA (U .} 


(2.41) 


where the first derivatives of the conservation variables are 
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interpolated linearly within an element. The third derivatives are 
then obtained from, 

CM] { U , . . . } = - f [N .] [N] dA { U, . (i not summed) (2.42) 

in j ^ ,1 n 

The second derivatives of the pressure are calculated with the first 
derivatives computed at the Gauss points while the third derivatives of 
the conservation variables are computed assuming linear variation of 
the first derivatives within an element. 

Numerical results for the Taylor-Galerkin finite element formula-, 
tion with the lapidus, MacCormack-Bal dwin, and Jameson artificial 
dissipation models are presented in Chap. 4. 

2.4 Petrov-Galerkin Algorithm 

The method has as its basis the streamline upwind concepts derived 
by Hughes and Brooks [35]. The addition of diffusion to stabilize the 
computation is along streamlines. The directional characteristic of 
the added diffusion avoids crosswind diffusion. The streamline-upwind 
Petrov-Galerkin (SUPG) principles were first applied to the linear 
advection equation and the incompressible Navier-Stokes equations by 
Hughes and Brooks [36]. Extension of the SUPG concepts to the com- 
pressible Euler equations are detailed in [37]. A major drawback of 
the initial SUPG method was the absence of gradient controls in direc- 
tions other than the streamline. 

2.4.1 Entropy Variables 

An enhancement to the SUPG concept was developed by Hughes, 
et al. [14, 16], which improves the shock capturing caoabilities of the 
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finite element formulation. The enhancements include writing the Euler 
equations in terms of entropy variables and the use of a "shock - 
capturing" operator. The use of entropy variables lead to consistent 
error estimates and also results in a system of symmetric equations 
which are amenable to error analysis. 

The Euler equations given by Eq. (2.12) can be written as, 


U t + A. U . = 0 

, U I 9 1 

where A-j are unsymmetric Jacobian matrices 
tions given by. 


(2.43) 

in the co-ordinate direc- 


A i = F i,U (2.44) 

The form of matrices A-j for the 2D Euler equations appear in Appendix 
B. The matrices A-j are seen to be unsymmetric. To obtain 
dimensionally consistent stable results and to obtain entropy 
conservation from a weak formulation, the Euler equations are 
symmetrized using entropy functions [38]. A change of variables is 
introduced by defining new independent variables V to replace the 
conservation variables U. A one-to-one mapping is assumed between U 
and V. Equation (2.43) transforms to. 


A. V 




+ A i 


,1 


= 0 


(2.45) 


where 

A o - U ,V 

A i = F i,V = A i A o (2.46) 

Aj are the transformed Jacobian matrices, see Appendix B. The 
definition of variables V is such that the transformed Jacobian 
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matrices are symmetric, and matrix A 0 is positive definite and 
symmetric. This condition is satisfied by using the generalized 
entropy function H ( U ) and defining the variables V as, 



,U 


(2.47) 


Hughes et al. [14], use the physical entropy s, in the definition of 
the entropy function H as. 


H(U) = -ps (2.48) 

The new variables V, denoted as entropy variables, are thus defined as. 


V = 


pi 


-U 4 + pi (y + 1 

U 2 

U 3 

-U, 


where s is the entropy given by. 


s) 1 


s = In h - .1.) P* 


“1 


and 


pi = U. - 


U 1 + u| 


1U 


1 


(2.49) 


(2.50) 


(2.51) 


The entropy variables and the Jacobian matrices that appear in 
this dissertation are for 2D inviscid flows. The Euler equations based 
on entropy functions can be extended directly to 3D inviscid flows, and 
the coefficient matrices are given in [39]. 


2.4.2 Finite Element Formulation 

The discretization of the domain of interest and application of 
the method of weighted residuals to Eq. (2.45) results in the relation. 




where N is a matrix of element interpolation functions. For the Euler 
equations, the use of the simple Galerkin (or Bubnov-Galerkin, as it is 
sometimes referred to) method results in a formulation that conserves 
entropy. To account for the entropy production at shocks and 
discontinuities, the Petrov-Galerkin formulation uses weighting func- 
tions different from the interpolation functions N. 

The weighted residual equations for the Euler equations become, 

/« T <A„ V +A f V"' 0 ' (2.53) 

where W is the weighting function. The definition of W is such that 
compressible flows with flow characteristics including shocks, contact 
surfaces and expansions are modelled accurately. The weighting func- 
tion proposed in [16] contains three components: (a) a streamline 
operator, (b) a discontinuity capturing operator, and (c) a reduced 
discontinuity capturing operator. The need for these operators can be 
illustrated clearly by application to the advection equations. 
Appendix C details the form of the three operators and the dissipation 
added by each operator. 

For the compressible flow equations the discontinuities are shocks 
and the three operators can be tagged as the streamline operator, the 
shock capturing operator and the reduced shock capturing operator. The 
streamline operator adds diffusion to capture discontinuities and 
suppress shock related oscillations. The addition of diffusion is 
directional to avoid overly smoothed results. The directional 
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characteristics can be obtained, from the Jacobian matrices Aj given 
by Eq. (2.46). The direction of propagation of information can be 
obtained from a set of real eigenvalues of the Jacobian matrices. The 
weighting function with the streamline operator can be written as. 


W = N + S'. N . 

1 » 1 


(2.54) 


where Sj are the streamline operator matrices given by, 

S. = A. T|a| V (2.55) 

where A is a set of eigenvalues, and T is a matrix of right 
ei genvectors . 

The use of W defined as in Eq. (2.54) can be used to predict flow 
features such as shocks, but localized shock related oscillations 
persist. To suppress these oscillations the Petrov-Galerkin 
formulation uses the shock capturing operator. The shock capturing 
operator acts normal to the shock, and this direction is indicated by 
the gradients of the entropy variables. The definition of the shock 
capturing operator is based on splitting the Jacobian matrix operator 
into two parts, a parallel part Aj 0 acting in the direction of the 
gradient and Aij_ , acting normal to the gradient. Ajn is defined as. 


aT b Z. = 0 for Z f j.V j (2.56) 

where Zj are vectors normal to the direction of discontinuity. The 
use of the parallel components Ajn in the formulation modifies the 
weighting function and W can be written as, 
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W = N + si M . + rT N . 


where R-j are the shock capturing operator matrices. 
Rj are given by. 


(2.57) 
The matrices 


R i -*i, T . Is I ' 1 t , t < 2 - 58 > 

A H is the vector of eigenvalues, and T n is the matrix of right 
eigenvectors of the transformed matrices A-j n . 

Thus the Petrov-Galerkin formulation has two components - a 
generalized streamline operator and a shock capturing operator. The 
explicit control of the gradients is obtained by the shock -capturing 
term which implies that the component of the streamline operator in the 
gradfent direction is redundant. The projection of the streamline 
operator in the direction of the gradient can be subtracted out. The 
"reduced shock capturing" term is given by. 


v i ■ A i p 


(2.59) 


where P is the projection operator which can be written as 

P = T « a T b (2.60) 
a is a scalar, and Ty is the matrix of eigenvectors that appears in 
Eq. (2.58). With the inclusion of the reduced shock capturing operator 
the weighting functions W become. 


W = N + si N,. + rT N, . + Y[ N, . 
and the weighted residual formulation is written as, 

// < fl 0 V 't + A 1 V M> dA * 0 


(2.61) 


(2.62) 
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Details of the derivation of the coefficient matrices and the 
definition of the shock capturing operators appear in [39]. 

The finite element formulations described are designed to be 
pseudo-time marching schemes. The intent is to study problems which 
could be marched out in time to steady state. This suggests an 
approximation of the transient terms in Eq. (2.62) as. 






(2.63) 


Equation (2.62) can then be written as, 



N T A n Y t 
o ,t 




(2.64) 


During the transient procedure, matrix A 0 which appears on the LHS of 
Eq. (2.64) is computed at nodes. This makes possible the solution of 
the global equations given by. 


A / N T N dA { V + } 

°J A 




dA 


(2.65) 


which can be rewritten as, 

A o M ii ( V,t =f i 1 = i. 2 -- -nodes ( 2 . 66 ) 

Equation (2.65) is valid for each node in the domain. M^j is the 
value in the global mass matrix corresponding to node i, A 0 is the 
coefficient matrix computed at node i, and fj is the corresponding 
right hand side value. The time stepping procedure that results is 
given by, 


V n+1 = v*? + 
i i 


it A' 1 f, 


i = 1,2,3. . .nodes 


(2.67) 


where is the value of the entropy variable at node i at 
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t=t n+ j, V" the nodal value at t= t n and At the time-step. Details of 
transient algorithm used in this study can be found in [39]. 

2.5 Comments on finite element algorithms 

The previous sections describe the formulations for the Taylor- 
Galerkin and Petrov-Galerkin finite element algorithms. A brief over- 
view contrasting the characteristics of the two algorithms is 
appropriate. 

Both the Taylor-Galerkin and the Petrov-Galerkin algorithms are 
based on the principle of shock capturing. They differ by virtue of 
how the necessary artificial dissipation is added. The Taylor-Galerkin 
uses explicit artificial dissipation to capture shocks while the 
Petrov-Galerkin uses implicit numerical dissipation. To simplify the 
solution procedure both algorithms use explicit time-stepping schemes. 

The Taylor-Galerkin formulation based on Taylor series expansion 
is a model of algorithm simplicity. In contrast, the Petrov-Galerkin 
formulation is rather complicated. The definition of the weighting 
functions is complex and involves numerous matrix multiplications. 
Matrix multiplications expand memory requirements and are computation- 
ally expensive. Algorithms are usually reformulated to avoid such 
operations. The evolution of the one step Taylor-Galerkin to the two 
step formulation used in this dissertation is typical of this desire to 
eschew matrix multiplications. 

The implementation of the Taylor-Galerkin algorithm is simpler 
since the element integrals that appear in the formulation can be 
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evaluated in closed form. This procedure can be done just once outside 
the time stepping procedure. The Petrov-Galerkin algorithm yields non- 
linear element integrals which need to be evaluated using numerical 
integration procedures for each time -step. 

The Petrov-Galerkin formulation has good mathematical features 
since it is conducive to accurate error analysis and error estimates, 
and in the limit of vanishing dissipation ensures conservation of 
entropy. The algorithm is based on "upwinding" and needs no "tuning" 
parameters. Reduced integration procedures on the Petrov-Galerkin 
formulation have shown encouraging trends and may result in reduced 
computational expense with accuracy of results comparable to full 
integration procedures. 

The primary evaluation criteria for both algorithms include solu- 
tion quality, shock resolution, extent of vectorization, computational 
speed, and ease of extension to multidimensional inviscid and viscous 
flows. The chapters that follow develop vectorization strategies for 
the two formulations and demonstrate the performance of the algorithms 
for inviscid and viscous flows. 



Chapter 3 

VECTORIZATION STRATEGIES FOR FINITE ELEMENT CFD 


To predict flows encountered in realistic problems accurately, two 
ingredients are essential. The first is the development of finite ele- 
ment methodologies such as the Taylor-Galerkin and Petrov-Galerkin, 
described in the previous chapter. The second ingredient is the effec- 
tive implementation of the finite element formulation on the computa- 
tional facility at hand. 

For realistic 3D problems the number of degrees of freedom needed 
for an accurate analysis is astronomical, and the size of the database 
that must be handled taxes the storage available on most computers. 
Supercomputers, with their huge central memory as well as large and 
fast secondary memory, are designed to address such storage demands. 
Driven by the insatiable needs of computational fluid dynami cists for 
more memory and higher computational speeds, the supercomputer has in a 
very short time become an essential research tool for serious CFD 
researchers. 

On a supercomputer, the desirability for low computational costs 
and rapid turn-around times dictate the need for very effective 
programming strategies, in particular strategies aimed at vectorizing 
the computational procedure. Supercomputers, such as the CRAY II, CDC 
205 and the Langley VPS-32, are all endowed with large central memory 
but differ in their hardware configuration. Vectorization strategies 
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employed on these computers should be tailored to fit each machine's 
special characteristics if optimum performance is to be approached. 

The vectorization procedure should also be global, that is, as 
much of the program as possible should be vectorized. Figure 3.1 shows 
the ratio of computational speed to maximum computational speed attain- 
able for various level of vectorization. The three curves correspond 
to computers with vector/scalar speeds of 5, 10, and 20. It can be 
seen for the VPS-32 (vector/scalar ratio of 20), a program that is 90% 
vectorized will run only at 30% of the maximum attainable speed. Vec- 
torization levels close to 100% are a must if the intent is to work 
problems modelled with elements and nodes that number in the hundreds 
of thousands. 


3.1 VPS-32 Characteristics 

The NASA Langley Research Center uses the VPS-32, a CYBER 205 with 
a central memory of 32 million full precision words, for its CFD appli- 
cations. The special features of the VPS-32 which set it apart from 
scalar machines are its virtual memory and pipeline processing 
capability. 

Virtual memory: The VPS-32 is a virtual memory computer which means 

that the user has the ability to access a virtual address range that 
exceeds the physical size of the central memory. Information not 
within the central memory is brought into central memory by the operat- 
ing system. Information, which may be working arrays or code, resides 
on blocks of data called pages. The process of retrieving pages, 
placing them in central memory, and if needed, removing other pages 
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Fig. 3.1 Ratio of computational speed to extent of program 
vectorization. 
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from the central memory is called paging. Virtual memory is useful in 
cases where a "small" problem is made "big" by change of one or more 
parameters. 

Pipeline processing: One of the principal features of the VPS-32 is 

its efficiency in handling long vectors. The reason for this 
efficiency is the concept of "pipeline processing." The arithmetic 
hardware consists of two units or pipelines, Pipe 1 and Pipe 2. Pipe 1 
is used for all vector arithmetic operations except divide and square 
root. Pipe 2 is used for all vector operations. Each pipeline is 
segmented, that is, a portion of the specific operation on two operands 
is done at the same time at the first step. The results of the first 
step are moved down to the next step of the operation while a new set 
of operands is moved into the first step. The segmented or pipeline 
construction of the vector machine allows vectors to be streamed 
through the pipeline at high rates. A good measure of effective vec- 
torization on the VPS-32 is obtained by looking at the timing informa- 
tion for various operations. The vector timing for operands of length 
n is given by, 

T = S + n/£ 

where T is the time in seconds, S the startup time, and £ the number of 
results obtained per operation. For example, the multiplication opera- 
tion has the timing information given by, 

T = 52 + n/2 

where l is taken to be 2 for the VPS-32. 

It is seen that for a fixed number of operations, the longer the 
vector the more efficient the computational procedure., This is due to 
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the fact that each vector operation requires a startup time or overhead 
time to begin operation, but after the first result the succeeding 
results occur at very high rates. The most effective rates are when n 
is very large and close to optimum rates are achieved for vector 
lengths in excess of a thousand. 

3.2 Strategies for Implementation of Finite Element CFD 

The characteristics of the VPS-32 indicate that a finite element 
procedure should be implemented such that most, if not all, operations 
are done with long vectors. Vectorization procedures for implicit 
algorithms designed to take advantage of this concept are detailed by 
Noor and Lambiotte [40]. The procedure advocated in [40] works well 
for dynamic analysis of structures using higher order finite elements. 
The vectorization procedure takes advantage of the large number of 
nodes per element and the need for a large number of integration 
points. This procedure is not well suited for fluid flows. Finite 
element compressible flow programs use simple elements (3 or 4 nodes/ 
element in 2D) and numerical integration of order 2 is usually 
adequate. 

Finite element procedures for simulating compressible flow fea- 
tures usually employ explicit time -stepping procedures. The explicit 
solution of the global equations is based by computing element matrices 
that occur in equations such as Eqs. (2.19) and (2.64). The left-hand 
and right-hand sides of Eqs. (2.19) and (2.64) are assembled resulting 
in a global system of equations given by Eq. (2.67). The right hand 
side vectors on an element level are called element residual vectors. 
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and the assemblage of the element vector results in the global residual 
vector. The solution of the system equations, Eq. (2.67) is repeated 
for each time-step until the conservation variables are perceived to 
have attained steady-state. 

Typical finite element formulations contain processes that are 
element -to -node operations and node -to -element operations which are not 
easily vectorizable. These hard-to-vectorize processes include: 

(a) element localization 

(b) computational of element residuals 

(c) assembly of global residual vectors 

(d) solution of the global equations 

(e) application of boundary conditions 

(f) Gauss integration 

Tasks (a) to (f) are all critical since they are repeated at each 
timestep. Effective processing rates are obtained only when all these 
operations are highly vectorized. The vectorizing strategies that 
follow were developed to exploit the hardware and software capabilities 
of the VPS-32. However, the overall vectorization strategy can be used 
to implement finite element formulations effectively for compressible 
flow analyses on other supercomputers such as CRAY machines. 

3.2.1 Element Localization 

In finite element procedures the region defined by the flow field 
is discretized into elements. The node numbers associated with an 
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element are contained in a row of the element connectivity matrix. The 
coordinates of the nodes defining an element, the value of the 
variables at these nodes, and other characteristic element quantities 
are obtained from global arrays using the connectivity row. This 
process is termed as localization. The vectorization of the 

localization process is simplified by the use of the Fortran supplied 
"gather" function on the VPS-32. The gather function generates a 
vector of real or integer numbers from a vector of the same type using 
an integer index vector. The use of the "gather" is- illustrated in 
Fig. 3.2(a). The index vector i is used to gather appropriate values 
of vector u into the vector v. 

The program structure of the scalar and vectorized versions for 
the process of obtaining element nodal coordinates is shown in Fig. 
3.2(b). The element connectivity matrix plays the role of index 
vectors to obtain the "right" nodes for the localization process. For 
the scalar version, the index vector is a row of the connectivity 
matrix which is of length equal to the number of nodes per element. 
The index vector on the vectorized version is a column of the 
connectivity matrix which is of length equal to the number of elements 
in the discretized domain. Since realistic problems use elements that 
number in the thousands, the vectorized version of the localization 
procedure is highly efficient. The procedure to obtain other element 
quantities, such as element nodal variables, is vectorized along 
similar lines. 

3.2.2 Computation of Element Residuals 

Finite element scalar codes generate the global residual vector by 
the following sequence of operations: (1) evaluate the stiffness 
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matrix for an element, (2) compute the element residual matrix using 
the vector of element nodal variables, (3) assemble the element 
residual matrix into the global residual vector using the connectivity 
array, and (4) repeat steps (1) to (3) for each element. For most 
finite element programs, the generation of the element residual 
matrices and the assemblage of these residual matrices into the global 
residual vector lays a heavy demand on the computational resources 
available. 

Generation of the element stiffness matrices is vectorized by 
using the vectors obtained from the localization process. The arrays 
that result from vectorization of the localization process of 3.2.1, 
are of length equal to the number of elements in the discretized 
domain, and use of these vectors enables vectorization of the element 
stiffness computations. To gain further insight into how the scalar 
and vectorized versions for generating the element stiffness matrices 
differ, consider the "chips and soda straws" analogy shown in Fig. 
3.3. The scalar version bears resemblance to the stack perceived in a 
can of Pringles potato chips. The ordering is horizontal and each chip 
can be considered as an element stiffness matrix. However, the 
vectorized version generates the long vectors and the ordering is 
vertical, reminiscent of a stack of soda straws. Element stiffness 
matrices generated by this procedure are used to compute the residual 
matrices for all elements. 

The assembly of the element residual matrices into a global resi- 
dual matrix is vectorized with the aid of the "scatter" function avail- 
able on the VPS-32. The scatter operation is the reverse of the gather 
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Fig. 3.3 Illustration of differences in scalar and vectorized versions 
for generating element matrices. 
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operation and is illustrated in Fig. 3.4. The index vector "i " is used 
to scatter the appropriate values of vector u into the vector v. It is 
to be noted that if the index vector has repeating numbers in it, the 
value indicated by the first index is overwritten by that pointed to by 
the second repeated index. In the example in Fig. 3.4 "a" was over- 
written by "d" as the first value in vector v. 

3.2.3 Assembly of Element Residual Vectors 

Finite element meshes can be either "structured" or "unstruc- 
tured." A structured mesh results when the node numbering in a mesh is 
"well ordered" and the region modelled is of simple geometry. A struc- 
tured mesh and its connectivity appear in Fig. 3.5. The node numbering 
for the finite element mesh is such that the connectivity starts at the 
node on the lower left hand corner of the element and goes counter- 
clockwise. The vectorized version, as indicated earlier, uses the 
columns of the connectivity matrix as index vectors for the assemblage 
process. For a structured mesh, a node number appears but once in the 
same column, which reduces the assemblage to simple scatter operations. 

Typical finite element meshes are "unstructured." This implies 
that the number of elements connected to a node is not constant within 
the domain. An unstructured mesh and its connectivity appear in Fig. 
3.6. It is seen that in column 3 of the connectivity matrix node 11 
appears twice. Using a simple scatter with this column as the index 
vector will produce erroneous results. This is due to the fact that 
the value pointed to by the second index in that column is overwritten 
by the value pointed to by the fifth index in the same column. This 
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Fig. 3.4 Illustration of the scatter operation. 
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Fig. 3.6 Unstructured finite element mesh and associated matrices. 
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loss of information can be prevented by the use of a 
"recursi ve -scat ter. " 

The recursi ve-scatter is a multipass scatter, the number . of 
passes needed being equal to the number of times a node is repeated in 
a column of the connectivity array. The number of times a node appears 
in a column is tagged by an integer array, denoted as the "events 
counter" array. When a node repeats twice a value of 2 is entered in 
the events counter array corresponding to the location of that node in 
the connectivity array. The scattering process uses the events counter 
array to ensure that no information is lost or overwritten. The events 
counter array for the unstructured mesh in Fig. 3.6a appears in Fig. 
3.6b. The assemblage procedure using the recursive-scatter concept is 
detailed in Fig. 3.7. 

The assembly process uses columns of the residual arrays, the 
length of a vector being equal to the number of elements in the 
domain. For the scatter process, columns of the connectivity array 
serve as index vectors. Node 11 appears in column 3 twice and two 
passes or iterations will have to be made to properly assemble the 
residual matrices corresponding to this column of the connectivity 
matrix. At the first pass the first occurrences of all the nodes are 
used as pointers to assemble element residual vectors. During this 
first pass, the second occurrence of nodes are assigned to point to a 
dummy location. On the second pass, the second occurrence of nodes are 
used as pointers for assemblage while the first occurrences are dumped 
into the dummy location. For an assembly process that needs two 
passes, such as for the mesh on Fig. 3.6, the results obtained from the 
two passes are added. The event counter matrix thus permits the use of 
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an unstructured mesh but makes sure that the assemblage process is done 
consistently. Figure 3.7 also shows the erroneous results that might 
accrue using a simple scatter on an unstructured mesh. 

The section of the code which computes the events counter array 
is not vectorized. The penalty to be paid for this scalar operation is 
quite small since the generation of the events counter array is done 
outside the time loop. The use of the events counter array enables the 
assemblage procedure, which has to be repeated for each time-step, to 
be fully vectorized. The scalar and vectorized versions of the process 
of generating element stiffness matrices and their assemblage appears 
in Fig. 3.8. 

3.2.4 Solution of Global Equations 

Element equations, such as Eqs. (2.19) and (2.64), are typically 
evaluated for each element and then assembled into global arrays 
according to the locations defined by the connectivity array. The 
assembled equations are of the form 

M.. 6U. = R. (3.1) 

where M-j-j is the element in the global lumped mass matrix correspond- 
ing to node i, 6U-j is the solution vector corresponding to node i, 
and R-j is the corresponding nodal residual. The element lumped mass 
matrix is calculated using closed form integration and then assembled 
to form the left hand side of Eq. (3.1). The assembly process is done 
using the events counter array for unstructured meshes. 
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Fig. 3.8 Scalar and vectorized versions for generating global 


residuals 
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A multipass iterative scheme can be used to include the contribu- 
tions of the non-diagonal terms in the consistent mass matrix [4]. The 
inclusion of the non -diagonal terms improves solution quality for 
structural analysis, but this trend has not been established for com- 
pressible flow calculations. 

The use of vectors of length equal to the number of elements in 
the domain, and the use of the scatter, or the recursive-scatter if 
need be, ensure the procedure of calculating the global mass matrix to 
be fully vectorized. The solution of the system equations is then a 
simple vector division operation. 

3.2.5 Application of Boundary Conditions 

Inviscid analysis of flow fields include processing nodes that may 
have constraints such as slip and no penetration. The vectorization 
procedure uses gather operation to grab specific nodes, processes 
these nodes, and then scatters them back into the global arrays. Since 
the solution process is explicit, the imposition of the boundary condi- 
tions is done after the solution of the global equations. Use of 
gather and scatter functions ensure effective vectorization of this 
process. 

3.2.6 Gauss Integration 

Element integrals that are complex or those that contain nonlinear 
quantities cannot be evaluated in closed form and must be evaluated 
using numerical integration techniques. The Taylor-Galerkin artificial 
dissipation terms that appear in Eqs. (2.30), (2.34) and (2.40), and 
the Petrov-Galerkin element integrals in Eq. (2.64) are evaluated using 
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numerical integration. The numerical evaluation of these integrals 
replaces a typical integral by a summation process. The most popular 
evaluation procedure for finite element integrals is Guass -Legendre 
quadrature [41]. The number of sampling, points used for the element 
evaluation depends on the degree of the polynomial in the integral. If 
the element integrals involve complex expressions, numerical experi- 
ments may have to be done to determine the number of integration points 
needed for sufficient accuracy. 

To illustrate the strategy to effectively vectorize the process of 
numerical integration, the generation of a global stiffness matrix is 
explained. For the scalar version the process is to: (a) evaluate the 
contribution of the element stiffness matrix due to one Gauss point,- 
(b) add up the contributions to a element stiffness matrix from all the 
Gauss points, (c) assemble the element stiffness matrix into the global 
stiffness matrix, and (d) repeat this procedure for each element in the 
domain. 

The vectorized version of this procedure turns the scalar concept 
inside out. The inner loop, the loop over the Gauss points, becomes 
the outer loop for the vectorized version. The flowchart for the two 
versions is illustrated in Fig. 3.9. The vector lengths used in the 
numerical integration is of the order of the number of elements in the 
domain which implies good vectorization. 

3.3 Comments on Vectorization Strategies 

This chapter introduced strategies for vectorizing explicit time- 
marching finite element algorithms for compressible flow. The 
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Integral evaluations using Gauss quadrature. 


48 


procedures described were used to develop vectorized Taylor-Galerkin 
and Petrov-Galerkin programs. The programs developed include 2D 
inviscid, 2D viscous and 3D inviscid flow codes. In the following 
chapters these finite element programs are used to detail inviscid and 
viscous features for a variety of flow problems. The results obtained 
from these analyses are used to compare and contrast the performance of 
the Taylor-Galerkin and Petrov-Galerkin algorithms. 



Chapter 4 

COMPUTATIONS FOR 2D INVISCID FLOWS 


The Taylor-Galerkin and Petrov-Galerkin finite element formula- 
tions described in Chap. 2 were applied to 2D inviscid flow problems to 
assess solution accuracy, shock resolution, convergence rates, and com- 
putational speed. The Taylor-Galerkin formulation uses explicit 
artificial dissipation and the performance of the three dissipation 
models introduced in Chap. 2 are compared. The use of local time- 
stepping schemes to stimulate faster convergence rates is also 
investigated. 

4.1 Performance of Artificial Dissipation Models 

The performance of the three dissipation models for the Taylor- 
Galerkin algorithm are compared by solving a flow problem with an exact 
solution. Mach 3 inviscid flow over a compression corner is chosen for 
this purpose. The flow parameters after the shock can be computed 
exactly by using oblique shock relations. The flow variables for the 
entire flow field for the compression corner are shown in Fig. 4.1a. 
The top boundary is assumed free, and slip boundary conditions are 
imposed along the wall as shown in Fig. 4.1b. The finite element mesh 
used for comparing the performance of the three dissipation models 
contains 1239 nodes and 1160 elements, Fig. 4.1c. 
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Fig. 4.1 Flow configuration and finite element mesh for compression 
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The density contours obtained using the Lapidus dissipation model 
appear in Fig. 4.2. A dissipation constant of v = 1 was used for the 
computations. The contours show the presence of spurious oscillations 
at the root of the oblique shock, and an inflow of spurious information 
from outside the domain is seen at the free top boundary. The free- 
stream oscillations show an undershoot of 1 % (1.23 instead of 1.4). 

The density contours obtained using the MacCormack -Baldwin dissi- 
pation model are shown in Fig. 4.3. The use of the MacCormack -Baldwin 
dissipation model results in a dramatic reduction in the freestream 
oscillations. The results also indicate the absence of the spurious 
oscillations at the top of the region. The shock structure is seen to 
be crisper with the contour lines coming together at the shock. A 
small undershoot of about 2 % is seen close to the shock. 

The density contours obtained using the Jameson dissipation model 
with dissipation coefficients of e^ = 1 and e^ 4 ^ = 1/64 are shown 
in Fig. 4.4. The blended dissipation model of Jameson yields results 
that show lesser oscillations than the Lapidus model, and the 
freestream oscillations are limited to less than 1 % ( 1.39 instead of 
1.4). The contours in Fig. 4.4 also indicate the inflow of spurious 
oscillations from outside the domain at the top. 

A further comparison of the solutions obtained by the use of the 
three dissipation models is shown in Fig 4.5. The density distribution 
at the outflow shows the superior results obtained by the MacCormack - 
Baldwin and Jameson dissipation models. The results obtained by the 
Jameson model also show better control of post-shock freestream oscil- 
lations. The Lapidus dissipation model shows the presence of shock 
related oscillations and the overall solution accuracy is quite poor. 




Fig. 4.2 Density contours for compression corner using Lapidus 
dissipation. 
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Fig. 4.4 Density contours for compression corner using Jameson 
dissipation model. 
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The results obtained for this model problem indicate that the 
Lapidus model produces spurious oscillations at the shock and in the 
freestream. The results obtained using the MacCormack-Baldwin and 
Jameson operators are virtually identical. The Jameson model reduces 
freestream oscillations and improves convergence rates. The disadvan- 
tages of the Jameson model include the need for the expensive second 
and third derivative evaluations of the conservation variables. The 
scheme introduces two "tuning" parameters which may indicate the need 
for parametric studies to find the "right" values for the two constants 
for each problem of interest. The MacCormack-Baldwin dissipation model 
shows the least oscillations in the freestream, and the accuracy of the 
procedure is uniformly good. The results presented in the remainder of 
this chapter for the Taylor-Galerkin formulation were obtained using 
this dissipation model. 

The use of triangular elements instead of "quad" elements for the 
Taylor-Galerkin formulation using dissipation a la Lapidus has shown a 
reduction in freestream oscillations [4]. For the triangular elements, 
the use of the MacCormack-Baldwin dissipation model has shown no signi- 
ficant advantages over the Lapidus model [42]. The contours of Figs. 
4.2 and 4.3 suggest the Taylor-Galerkin formulation with quad elements 
is sensitive to the artificial dissipation model used. For quad ele- 
ments significant improvements in solution quality are obtained using 
the MacCormack-Baldwin dissipation model. 

4.2 Local Time-stepping Scheme 

Many problems of concern to the aerodynamicist are steady state 
problems wherein the transient behavior is of little interest. The use 
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of time-steps that depend on the local flow conditions and geometry can 
help accelerate convergence. The allowable time-step varies throughout 
the mesh and is locally constrained by the CFL condition [43]. The CFL 
condition relates the propagation of information within the mesh to the 
grid spacing of the mesh. In two space dimensions the CFL condition 
limits the local time-step to (a t )cpi_ given by, 
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where u-j are the velocity components in the coordinate directions, 
ax-j the grid spacing in the coordinate directions and c is the local 
speed of sound. 

Finite difference methods use i-j grids where the geometry data is 
completely structured. Fig 4.6a. For finite elements the orientation 
of an element is random and the definition of ax-,- is not 
straightforward. For finite element meshes a better approach is to 
compute A? and An as defined in Fig. 4.6b, where s and n are the local 
directions that depend on the orientation of the element. The CFL 
condition can then be written for a typical finite element as, 
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where U-f are the local velocity components. The velocities u-j can 
be obtained by a transformation of the velocity components in the 
coordinate directions given by. 
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(b) Typical quad element 


Fig. 4.6 Definition of grid spacings for finite difference and finite 
element meshes. 
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where 9 and 8 are the inclination angles of the local directions with 
the global coordinate axes. 

The allowable time-step for an element is calculated from the 
relation. 


(At) e i e ■ ° (At J qpl (4.4) 

where a is a safety parameter that depends on the algorithm of choice. 
Finite element algorithms typically need local time-steps defined at 
the nodes in addition to those defined for the elements. The local 
time-step at a node is calculated based on the minimum element 
time-step of all the elements surrounding that node. For a typical 
node i, the local time-step is calculated from, 
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j * 1, nel 

where nel is the number of elements surrounding node i. 

The benefits of using this time-stepping procedure are demon- 
strated by predicting the Mach 3 flow over the compression corner of 
Fig. 4.1. The artificial dissipation model used is the MacCormack- 
Baldwin model with a dissipation constant v = 1. Figure 4.7 contrasts 
the density contours for the Taylor-Galerkin procedure with global and 
local time-steps. The use of local time-steps is seen to reduce pre- 
shock oscillations at the top right corner of the flowfield. Figure 
4.8 plots the density distribution at the outflow and indicates a 
sharper shock with the local time-stepping scheme with the shock being 
captured within 5 nodes instead of within 7 nodes. The use of local 


time-steps results in faster convergence rates as indicated by Fig. 
4.9. The local time-stepping causes the L£ norm of the density 
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Fig. 4.9 Comparative convergence rates for compression corner using 
local and global time-stepping schemes. 
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changes to drop over two orders of magnitude within 200 iterations, 
compared to the 400 iterations needed for the global time-stepping pro- 
cedure. 


4.3 Evaluation of Inviscid Formulations 

The Taylor-Galerkin formulation implemented with the MacCormack- 
Baldwin dissipation model and the local time-stepping procedure de- 
tailed in the previous section is compared with the Petrov-Galerkin 
formulation for a variety of problems. The Petrov-Galerkin formulation 
is also implemented with the local time-stepping scheme to provide a 
better basis for comparison. The evaluation of the finite element for- 
mulations is based on criteria which include solution accuracy, shock 
resolution, spurious oscillation control, computational speed, and 
storage requirements. 

The problems that are used for the evaluation consist of: (1) 
Mach 3 flow over a compression corner, (2) Mach 6 expansion over a 
sharp corner, (3) interaction of a scramjet exhaust with the free- 
stream, and (4) Mach 6.57 flow over a blunted leading edge. 

The compression corner and the Prandtl -Meyer expansion were chosen 
due to the availability of exact solutions. In addition to illus- 
trating basic features of compressible flows such as shocks and expan- 
sions, these two problem helped validate the computer programs. The 
interaction of a scramjet exhaust with the freestream and the Mach 6.57 
flow over the blunted body are chosen because of the availability of 
solutions by other numerical methods such as finite volume or finite 
difference methods. These problems are also typical of the flow 
situations encountered in realistic problems. 
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The Taylor-Galerkin and the Petrov-Galerkin formulations were used 
to predict the flow behavior for the problems listed above. The 
Taylor-Galerkin formulation with the MacCormack-Baldwin dissipation 
model uses a dissipation constant of \> = 1 for the first three prob- 
lems. Results obtained for the hypersonic flow over the blunt leading 
edge indicated the need for a dissipation constant of v = 2 to suppress 
shock related oscillations. 

4.3.1 Compression Corner 

The density contours for the compression corner using the Taylor- 
Galerkin and Petrov-Galerkin formulations appear in Fig. 4.10. The 
results obtained for the Taylor-Galerkin formulation indicate the 
smeared shock along the wall and other flow details that have been 
discussed in section 4.2. Results obtained using the Petrov-Galerkin 
formulation indicate the absence of spurious oscillations. The shock 
obtained is very crisp as indicated by the contour levels running close 
together, and the density at the wall shows no oscillations. 

The distribution of density along the wall and at the outflow 
plane for the two methods are compared in Fig. 4.11. Both methods show 
good shock capturing properties with the shock defined within five 
nodes. The Taylor-Galerkin algorithm exhibits a little undershoot at 
the corner and spurious oscillations occur along the wall. The pres- 
ence of these oscillations is seen clearly by the distributions of Fig 
4.11b. The results obtained by the Petrov-Galerkin formulation show no 
such oscillations, and the density at the wall compares exactly with 
analytical solutions. 
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4.3.2 Prandtl -Meyer Expansion 

The capability of the finite element formulations for detailing 
expansion waves is illustrated by predicting the features of a Mach 6 
flow over a 10° corner. The flow parameters after the expansion can be 
obtained from isentropic relations. The flow configuration and the 
boundary conditions for the region are shown in Fig. 4.12a. A finite 
element mesh containing 1800 quad elements and 1920 nodes is used to 
predict the effects of expanding the Mach 6 flow through 10°. 

The density contours for the Taylor-Galerkin and Petrov-Galerkin 
formulations appear in Fig. 4.13. The contours for the Taylor-Galerkin 
indicate the presence of a few oscillations at the root of the expan- 
sion fan and an overshoot of about S% in the freestream. The density 
contours for the Petrov-Galerkin formulation show very little oscilla- 
tions. The contour levels indicate a smooth transition through the 
expansion fan. The root of the expansion fan is smeared along the 
wall, but the smearing is not as severe as that obtained using the 
Taylor-Galerkin formulation. 

The distributions of density at the outflow plane and along the 
wall for the two methods appear in Fig. 4.14. The distributions at the 
outflow plane indicate the smooth transition of values from the free- 
stream to the values at the tail of the expansion fan. The presence of 
the slight overshoot mentioned earlier for the Taylor-Galerkin formula- 
tion is seen more clearly from this distribution plot. The density 
distributions along the wall indicate the presence of a kink at the 
corner for the Taylor-Galerkin. The results of the Petrov-Galerkin 
formulation indicate the presence of a few oscillations near the 
corner, but the expansion predicted at the wall is smoother. 




(b) Finite element mesh 


Fig. 4.12 Flow configuration and finite element mesh for Prandtl -Meyer 
expansion. 
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(a) Density distributions at the outflow 



X 

(b) Density distributions along the wall 


Fig. 4.14 Comparative density distributions at the outflow and along 
the wall for the Prandtl -Meyer expansion. 
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Adequate resolution of the root of the expansion fan is critical 
for obtaining accurate solutions throughout the flowfield. The distri- 
butions of Fig. 4.14 indicate the need for more refinement- at the 
corner to capture essential details of the expansion. 

4.3.3 Scramjet Exhaust Flow 

The scramjet has come under renewed scrutiny due to its potential 
applications in hypersonic research vehicles. The exhaust from the 

scramjet engine interacts with the freestream at the exit producing a 

1 

i 

shear layer and a shock; the predominant flow features that result due 
to this interaction are shown in Fig. 4.15a. The finite element mesh 
used to predict the flow feature appears in Fig. 4.15b. .The mesh con- 
tains 2100 elements and 2226 nodes and is refined at regions where the 
gradients of the flow variables are expected to be large. 

The density contours obtained for the finite element formulations 
appear in Fig 4.16. The expansion through the nozzle exhaust is indi- 
cated by the contour levels (1-0). The interaction of the expanded 
flow with the freestream at the bottom results in the shock and shear 
layer shown. The figures indicate the good shock capturing capabili- 
ties of both methods. The Taylor-Galerkin formulation shows a few 
localized oscillations at the corner where the flow from the nozzle 
exit interacts with that from the freestream. The shock location for 
the Petrov-Galerkin is seen to be sharper and closer to the shear 
layer. Details of the flowfield can be better shown by the distribu- 
tion of quantities such as pressure and Mach number at the outflow. 

Figure 4.17a compares the distribution of pressure at the outflow 
for both methods and the difference in the shock location is clearly 
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Fig. 4.15 Flow configuration and finite element mesh for scramjet 
exhaust interaction. 
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(a) Taylor-Galerkin contours 



(b) Petrov-Galerkin contours 


Fig. 4.16 Comparison of density contours for the scramjet exhaust 


interaction. 
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seen. The shock location predicted by the finite difference program 
SEAGULL [44] is also shown for comparison with the finite element 
results. The Taylor-Galerkin formulation is seen to be in excellent 
agreement with the finite difference code for the shock location while 
the Petrov-Galerkin formulation predicts a weaker shock shifted closer 
to the shear layer. 

The distribution of Mach numbers along the outflow plane is 
plotted in Fig. 4.17b. The sharp drop in the Mach number at around y = 

1.4 indicates the location of the shear layer. Flow exhausting from 
the scramjet interacts with the freestream along the line defining the 
shear layer. 

4.3.4 Blunt Leading Edge 

The Aerothermal Loads Branch at the NASA Langley Research Center 
uses the 8’ High Temperature Tunnel to test a variety of structural 
configurations. The panel holder used in testing has a blunt leading 
edge, and the finite element formulations are used to simulate the flow 
over the blunt section. The geometric configuration and the finite 
element mesh used for the analysis is shown in Fig 4.18. 

Figure 4.19 shows the density contours obtained for the Taylor- 
Galerkin and Petrov-Galerkin formulations. The location of the shock 
in front of the leading edge and the shock location at the outflow are 
seen to be radically different for the two methods. The density con- 
tours for the Tayl or-Galerkin formulation are smooth, and no post-shock 
oscillations are visible. The shock standoff distance is seen to be 
considerably smaller for the Petrov-Galerkin algorithm. The density 
levels at the body, especially at the outflow, is lower than that 
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Fig. 4.17 Comparative Pressure and Mach number distributions at 
outflow for the scramjet exhaust interaction. 
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Fig. 4.18 Flow configuration and finite element mesh for Mach 6.57 flow 
over a blunt body. 
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(b) Petrov-Galerkin contours 


Fig. 4.19 Comparison of density contours for blunt leading edge 
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predicted by the Taylor-Galerkin formulation. The presence of 
post-shock oscillations for the Petrov-Galerkin formulation is also 
evident from the density contours. 

The shock locations predicted by the two finite element 

formulations along the centerline are compared wi tn the shock location 
given by the empirical relation of Billig [45] in Fig 4.20a. The 
Taylor-Galerkin formulation is seen to predict the shock location 
better than the Petrov-Galerkin. The comparison of the u-velocity 
component at the outflow for the two finite element formulations 
appears in Fig. 4.20b. The results obtained by the finite element 
methods are also compared with the finite volume results of Walters 
[46] for the same problem. The Taylor-Galerkin results show a few 
oscillations close to the surface of the body. The location of the 
shock predicted by the Taylor-Galerkin and the finite volume method are 
close, but the shock location of the Petrov-Galerkin formulation is 
clearly in error. The velocity at the body for the Petrov-Galerkin 
formulation is also higher which can be related to the low density 
levels at the wall . 

Recently a modification of the Petrov-Galerkin formulation was 
proposed [18] to make the method more "conservative." Better 

conservation of the flux quantities are obtained by integrating by 
parts the convective term in the weighted residual formulation. The 
weighted residual formulation is given by Eq (2.64) as, 

l N T *0 v -t dA - -/ A wT N V M dA (4.6) 

The use of Eq. (4.6) ensures conservation as long as the numerical 
integration procedure is of sufficient accuracy. The loss of 
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(a) Velocity distributions along centerline 
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(b) Velocity distributions at the outflow 

Fig. 4.20 Comparative velocity distributions along the centerline and 
at the outflow for the blunt leading edge. 



80 


conservation is of the same order as the error in approximations using 
numerical quadrature. To circumvent the possibility of loss of conser- 
vation the weighted residual formulation can be recast by using an 
integration by parts procedure. The balance law for the Euler 
equations defines the transformed flux quantities F-j as, 

^i,i ~ N ^’i (i not summed) (4.7) 
Using Eq. (4.7) the advective terms can be written as, 

// f i,i * * -f/’i <« + / s " T dS (4.8) 

The use of the integration by parts procedure ensures conservation of 
the advective flux especially when approximate integral evaluation pro- 
cedures are used. 

Researchers at Stanford University have used the Petrov-Galerkin 
formulation with this modification to predict the flow over the hyper- 
sonic blunt body for the flow parameters and the finite element mesh 
given in Fig. 4.18. The density contours obtained for the revised 
Petrov-Galerkin formulation appear in Fig. 4.21. The contours indicate 
a very well defined shock and the absence of oscillations throughout 
the flowfield. A comparison of the shock standoff distance at the 
centerline appears in Fig. 4.22 and compares very well with the predic- 
tion of Billig [45]. The results obtained from the Petrov-Galerkin 
formulation are also compared with an interpolation between the finite 
difference results for Mach 6 and Mach 8 flows [47]. 

The distribution of the u -velocity component at the outflow for 
the Petrov-Galerkin formulation is compared with the results of Walters 
[46] in Fig 4.23. The results indicate excellent agreement with the 








finite volume 

method of lines (M=6.8) 



Fig. 4.23 Comparative velocity distributions at the outflow for the 
blunt leading edge using a modified Petrov-Galerkin 
formulation, Ref. [18]. 







finite volume method. The shock at the outflow is seen to be captured 
within 3 nodes without any post-shock oscillations. 

The results of the modified Petrov-Galerkin method indicate the 
need to use this modification for compressible flow calculations, 
especially for flows where conservation of the advection flux is criti- 
cal. The modifications to the Petrov-Galerkin algorithm do not seem to 
present any special hardships for vectorization. A preliminary inves- 
tigation of the modified Petrov-Galerkin formulation indicates minor 
changes in the programming strategy and computational speed and a major 
improvement in solution quality and accuracy. 

4.4 Evaluation of Programming Strategy 

The merits of the vectorization procedures of Chapter 3 are high- 
lighted by Table 4.1 which compares the computational speed for the 
scalar and vectorized version of the Taylor-Galerkin algorithm. Three 
flow problems, ranging from a problem with 100 elements to one with 
over 1000 elements, are used to quantify the effectiveness of the vec- 
torization procedure. The scalar program was run on a CDC 855 and the 
vectorized version was run on the VPS-32. The figures in Table 4.1 
indicate that the larger the number of elements in a flow analysis the 
bigger the computational benefits of using the vectorization procedure. 

The performance evaluation of the finite element programs involves 
issues such as computational speed, storage requirements, and speed of 
convergence. Effective programming strategies on the VPS-32 include, 
in addition to vectorization schemes, factors such as efficient use of 
central memory, organization of data structures, and use of virtual 


memory . 
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Table 4.1 

Comparison of Computational Rates for Scalar and 
Vectorized Versions of Taylor-Galerkin Algorithm 


PROBLEM 


CPU SECS 

/ 


N is the number 
of elements 

SCALAR CODE 
CY 170-855 

VECTOR CODE 
CYBER 205 

SCALAR 

VECTOR 

Shock tube 
N = 100 

234 

3.5 

68 

Wedge 
N = 672 

4047 

14 

280 

Woodward 
Collela 
N = 1008 

1962 

6 

330 
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The computational speed of a compressible flow program is usually 
rated as the processing time in CPU time per node per time-step. Typi- 
cal computational speeds of the Taylor-Galerkin and Petrov-Galerkin 
formulations are: 

Taylor-Galerkin - 3.3 x 10"5 CPUs/time-step/node 
Petrov-Galerkin - 9.1 x 10 “® CPUs/time-step/node 

The results show that the Taylor-Galerkin program is about three 
times faster than the Petrov-Galerkin program. The numerical integra- 
tion procedure for the evaluation of the element matrices needed for 
the Petrov-Galerkin algorithm at each time -step is the reason for its 
higher computational times. Yet, the processing rate for the Petrov- 
Galerkin formulation with the four point Gauss integration is seen to 
be competitive with the Taylor-Galerkin procedure. If one point inte- 
gration procedures could be developed for accurate integral evalua- 
tions, the computational speed of the two formulations would be compar- 
able. 

On the VPS-32 it is possible to use a timing package to obtain 
information regarding the CPU time spent in each subroutine or in spe- 
cific operations within a subroutine. Information obtained from the 
timing package can be used to identify the most expensive operations in 
the formulation. Strategies can then be developed to improve the 
method of programming those expensive operations. 

Figures 4.24 and 4.25 illustrate the flowchart for the Taylor- 
Galerkin and Petrov-Galerkin algorithms. The timing data for the main 
operations of both algorithms appear in Tables 4.2 and 4.3. For the 
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Fig. 4.24 Program flowchart for inviscid Taylor-Galerkin algorithm 
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Table 4.2 

Timing Data for Principal Operations for Inviscid 
Taylor-Galerkin Algorithm 


OPERATION CPU TIME 

(%) 

Input 1 

Element integrals 0.5 

Surface integrals 

Half step calculations 13 

Second step calculations 12 

Artificial dissipation 56 

Solution of global equations 6 

Application of boundary conditions 1 

Local time-step computations *7 
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Table 4.3 

Timing Data for Principal Operations for inviscid 
Petrov-Galerkin Algorithm 


OPERATION 

CPU TIME 
(%) 

Input 

1 

Element information 
(element interpolation 
derivatives, etc.) 

0.5 

Computational of element 
residuals 

86 

Assembly of element residual 
matrices 

1 

Solution of global equations 

4 

Application of boundary 
conditions 

1 

Local time-step computations 

6 
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Taylor-Galerkin formulation it is seen that the addition of artificial 
dissipation takes over 50% of the total CPU time. The element inte- 
grals that need to be evaluated for the artificial dissipation terms 
use four point Gauss integration which, even when fully vectorized, are 
seen to be computationally expensive. The timing information for the 
Petrov-Galerkin formulation indicates that major portions of the total 
CPU time is shared between the various terms needed in the evaluation 
of the element residual vectors. 

On supercomputers, such as the VPS-32, programs are usually 
written to have all the working arrays stored in central memory. The 
capability of the two finite element programs to handle large problems 
can be gauged by comparing the central memory required to work a sample 
problem. For a problem containing 1800 elements and 1911 nodes the 
storage required for the two algorithms is : 

* 

Words of memory Large pages 

Taylor-Galerkin 350,000 6 
Petrov-Galerkin 325,000 5 

These numbers show that the Petrov-Galerkin algorithm requires about 
10% less storage than the Taylor-Galerkin algorithm. This difference 
can be attributed to the Taylor-Galerkin algorithm computing and 
storing all the element integrals outside of the time-step loop. 

The finite element programs developed are pseudo-steady state 
codes wherein the steady state solutions are obtained by time 
marching. Convergence is assumed to occur when the L2 norm of the 
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conservation variables drop over three orders of magnitude. For such 
time marching schemes fast convergence rates are essential. The faster 
a program converges the less the computational cost. The use of local 
time-stepping procedures to improve convergence was demonstrated 
earlier and both formulations were implemented with the time-stepping 
procedure of section 4.2. Experience gained working numerous in viscid 
problems indicate that the Taylor-Galerkin formulation can be run at 
higher values of the safety factor a. Typically the Taylor-Galerkin 
formulations can run at values of a over 0.6, while the Petrov-Galerkin 
formulations usually work at a values of about 0.3. The trend is typi- 
cal of upwind schemes which need lower safety factors than the explicit 
artificial dissipation schemes. 

The rate of convergence for both formulations can be compared by 

plotting the L 2 norm of the changes in the conservation variables. 

For Mach 3 flow over the compression corner, the L 2 norm of the 
changes in the. regular conservation variables for the Taylor-Galerkin f 
algorithm and in the entropy variables for the Petrov-Galerkin 
algorithm appear in Figs. 4.26 and 4.27. The figures indicate that all 
the conservation variables follow similar trends regarding the rates of 
convergence, and the Petrov-Galerkin algorithm is seen to converge an 
order of magnitude more than the Taylor-Galerkin algorithm. 

A realistic measure of the convergence rates can be obtained by 
comparing the rate of convergence of the total pressure force on a body 

surface. For the compression corner the pressure force on the inclined 

wall can be computed exactly since the flow parameters after the shock 
can be obtained from the oblique shock relations. Figure 4.28 plots 
the time history for the total pressure force on the inclined wall for 
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Fig. 4.28 Comparative convergence of finite element algorithms based on 
pressure force on compression corner wall . 
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the Taylor-Galerkin and the Petrov-Galerkin algorithms. The pressure 
force computed using the exact shock relations is also indicated in 
Fig. 4.28. The Taylor-Galerkin algorithm reaches the value of the 
pressure force given by the exact solution in about 250 iterations 
while the Petrov-Galerkin needs about 500 iterations to reach the same 
pressure force value. The fast convergence for the Taylor-Galerkin 
algorithm can be attributed to the higher safety factor a used for 
computations. 

The convergence trends implied by Figs. 4.26 and 4.27 are at odds 
with those indicated by Fig. 4.28. The use of the pressure force is 
seen to be a better indicator for convergence rates than the L 2 norm 
of the change in conservation and entropy variables. The use of total 
pressure force for inviscid flows and shear or heat fluxes for viscous 
flows for evaluating convergence rates merits further investigation. 

4.5 Closing Comments 

The vectorization strategies of Chap. 3 have been applied to the 
Taylor-Galerkin and Petrov-Galerkin algorithms for 2D inviscid flows. 
This chapter has presented comparative results for the two algorithms. 
The algorithms have basic differences in philosophy which include the 
mode of artificial dissipation and the method of integral evaluations. 
The vectorization of the algorithms resulted in programs of comparable 
computational speeds and storage. The vectorization strategies used 
for the 2D inviscid flows are extended to 2D viscous flows in the next 
chapter. 



Chapter 5 

VISCOUS 2D COMPUTATIONS 


In the realistic problems encountered in high speed compressible 
flows, the flow behavior is influenced a great deal by viscous 
effects. For supersonic and hypersonic flight vehicles the effects of 
viscous heat dissipation, and shock -viscous interactions may play a 
crucial role in the performance of the vehicle. 

The flow features for high Reynolds number compressible flows can 
be predicted by two approaches. The first approach is to divide the 
flowfield into an inviscid region and a viscous region. A matching of 
the viscous boundary conditions enables the coupling of the viscid and 
inviscid regions. This approach reduces computational costs but is 
limited in applications to problems not involving effects such as flow 
separation or shock -boundary layer interactions [48]. The second 
approach is a global approach, wherein the Navier-Stokes equations, 
which are valid throughout the entire domain, are used to predict flow 
details everywhere. This approach is relatively straightforward but 
computationally difficult; however, the approach is necessary for com- 
plicated viscous problems such as shock -boundary layer interactions. 
In this dissertation a finite element Petrov-Galerkin formulation for 
the compressible Navier-Stokes equations is used to predict 2D viscous 
flow characteristics. 
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5.1 Navier-Stokes equations 


The compressible Navier-Stokes equations describe the characteris- 
tics of a laminar, compressible, viscous, heat conducting fluid and can 
be written as. 


U,. + F. . = fY . + F 1 ? . 
* t i ,i i,i i,i 


(5.1) 


where U is the vector of conservation variables and fY, fY, and f|? 
are the fluxes that correspond to the advection, viscous dissipation, 
and heat diffusion respectively. The vector U and the flux vectors in 
two dimensions are given by, 
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(5.2) 

Here p is the density, p the pressure, u-j the velocities in the 
coordinate directions, q-j the heat fluxes, the viscous stress 

components and E^ the total energy. The Kronecker delta 6 i j is as 
defined in Eq. (2.14). The shear stresses and the heat fluxes are 
gi ven by , 


t . . = X u. , 6..+u (u • . + u • •) 
ij k,k u ij ** 1 i,j j,i ' 


(5.3) 

q,. = -k T,. (5.4) 

where v and x are the coefficients of viscosity, k the thermal conduc- 
tivity, and T is the temperature. 
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The Petrov-Galerkin formulation uses entropy functions to symme- 
trize the Mavier-Stokes equations. The advantage of using a symmetric 
system of equations is that a weighted residual formulation based on 
these equations automatically inherits the stability possessed by the 
exact solution of these equations [14]. 

The spatial derivatives of the inviscid, viscous, and heat fluxes 
can be written as, 


F. • = F. U,. = A. U, . 
i,i i ,U *i i *i 

c v = k v n 
F i K 1*j U ’j 


F^ 1 = m 

F i K ij U ’j 


(5.5) 

(5.6) 

(5.7) 


Using the above relations in Eq. (5.1) the Navier-Stokes equations 
become. 


U ’t + V-i * + K ij 1 ’f 


(5.8) 


A change of variables is introduced by defining new independent 
variables V to replace the conservation variables U. A one-to-one 
mapping is assumed between U and V and Eq. (5.8) transforms to. 


where 




- (K ij V’i 


(5.9) 


A q = U, y (5.10) 

A. = A.A Q (5.11) 

^ij= K i/o (5 ‘ 12 > 

The matrices A 0 , A-,- and are symmetric, and A 0 and K-j j are 

positive definite. The terms of these matrices are given in [39]. The 

definition of V is based on the entropy functions and is given by 
Eqs. (2.47)— (2.51). 
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5.2 Finite element formulation 


The weighted residual formulation for (5.9) is given by. 



<v>t + V*i 


- (lf 1j * - 0 


(5.13) 


where W is a matrix of weighting functions different from the shape 
functions [N] that appear in Eq. (2.12). The weighting function for 
the viscous formulation is based on the weighting function used for the 
inviscid formulation Eq. (2.34). The modification in W in Eq. (5.13) 
is the need to gauge the^ relative importance of the artificial and 
viscous diffusion in specific regions of the flow. The weighting 
functions W for the streamline upwind part of the Petrov-Galerkin 
formulation is given by Eq. (2.54) as. 


W * N + (A. T|a | “ 1 t T ) T N,. . (5.14) 

where the matrix T depends on the eigenvalues of the Jacobian matrices 
A-j as well as on the local Peclet number. The Peclet number is a 
measure of the relative importance of the convective and diffusive 
effects and is defined as, 


Pe 



(5.15) 


where x-j is defined by the eigenvalue problem for T-j , and k is the 
thermal diffusivity. In the boundary layer, the local Peclet number is 
small, and viscous diffusion terms predominate. In the inviscid domi- 
nated regions, the local Peclet number is very large (advection domi- 
nates), and the numerical diffusion terms come into play. The use of a 
doubly asymptotic function of the local Peclet number to limit the 
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effects of numerical dissipation in the boundary layer is discussed in 
[39] and is used here for the viscous calculations that follow. 

5.3 Sample Problem 

The performance of the vectorized Petrov-Galerkin formulation is 
illustrated using the steady viscous supersonic flow past an isothermal 
flat plate as a test problem, Carter [49]. The solution domain and the 
discretization for this problem appear in Fig. 5.1. The finite element 
mesh for the problem contains 3111 nodes and 3000 elements. The mesh 
is graded near the leading edge to resolve the leading edge effects. 
The inflow is supersonic at Mach 3, and a Reynolds number of 1000 based 
on the length of the flat plate is assumed at inflow. The viscosity of 
the fluid was assumed to vary according to the Sutherland law, and the 
thermal conductivity was obtained from a constant Prandtl number taken 
to be 0.72. Figure 5.2 shows the density contours for the flowfield at 
convergence using the Petrov-Galerkin algorithm. The presence of the 
leading edge shock and the boundary layer is clearly visible from the 
contours. The results obtained from the Petrov-Galerkin formulation 
are compared to those of Carter [49] and Taylor-Galerkin results, 
[50]. Carter's mesh was 45x50 and the mesh used by the Taylor-Galerkin 
algorithm was 66x51. The distribution of density and velocity at the 

V 

outflow for the three methods appears in Figs. 5.3a - 5.3c. The 
density and u-velocity distribution for the three methods compare 
well. The Petrov-Galerkin distributions for the v-velocity show a kink 
at the wall but away from the wall the distributions agree well with 
those of Carter and the Taylor-Galerkin algorithm. The discrepancy at 
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Fig. 5.1 Flow configuration and finite element mesh for Mach 3 flow 
over flat plate. 

\ 
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(a) Comparative density distributions at exit 



U 

(b) Comparative u-velocity distributions at exit 


Fig. 5.3 Comparative distributions at the wall and at the outflow for 
flow over flat plate. 
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(d) Comparative pressures distributions along wall 


Fig. 5.3 Comparative distributions at the wall and at the outflow for 
flow over flat plate. 
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the wall is due to the Petrov-Galerkin formulation being sensitive to 
the outflow conditions close to the wall where the flow is subsonic. 
Fig. 5.3d shows the distribution of pressure along the plate for the 
three methods. The Petrov-Galerkin and Taylor-Galerkin algorithms show 
a smooth variation along the flat plate while the pressures predicted 
by Carter show a few oscillations at the leading edge. The plots 
indicate the validity of the computing procedure for 2D viscous flows. 

5.4 Comments on 2D Viscous Program 

As with the inviscid terms, the viscous terms in the Petrov- 
Galerkin formulation are nonlinear and need to be evaluated using Gauss 
quadrature. The vectorization strategies used to vectorize the in- 
viscid terms can be extended to the integrals that arise from the addi- 
tion of the viscous terms. The assemblage of the residual vectors due 
to the viscous terms and the solution of the system equations is simi- 
lar to the procedures outlined in Chap. 3. 

Figure 5.4 details the program flowchart for the 2D Petrov-Galerkin 
viscous formulation. The CPU seconds required for the main operations 
in the formulation appear in Table 5.1. Computation of the viscous 
terms takes a relatively small fraction of the total time (15 %), with 
an increase in storage of less than 5 %. Typical computational speeds 
of the 2D viscous Taylor-Galerkin and Petrov-Galerkin algorithms are: 

Taylor-Galerkin - 4.5 x 10"5 CPUs/time-step/node 
Petrov-Galerkin - 10.5 x 10"5 CPUs/time-step/node 
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Table 5.1 

Timing Data for Principal Operations for Viscous 
Petrov-Galerkin Algorithm 


OPERATION CPU TIME 

(*) 


Input , 1 

Element information 

(element interpolation 0.5 

derivatives, etc.) 

Inviscid terms 80 

Viscous terms . 13 

Solution of global equations 4 


Application of boundary conditions 


1 
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The storage needed for the two algorithms can be compared by comparing 
the total arrays, both real and integer, needed to work typical prob- 
lems. Storage requirements for the two formulations are given by, 

Taylor-Galerkin - 60 x NPOIN + 336 x NELEM 

Petrov-Galerkin - 13 x NPOIN + 170 x NELEM 

where NPOIN and NELEM are the number of nodes and elements in the 
domain. The figures indicate that the Petrov-Galerkin formulation uses 
less than 50% of the storage needed for the Taylor-Galerkin formula- 
tion. The element integrals for the inviscid and viscous terms are 
calculated and stored before the start of the transient loop for the 
Taylor-Galerkin, and thus the need for a substantially larger storage. 

A drawback of the Petrov-Galerkin viscous formulation, as men- 
tioned earlier, is the need for numerical integration. For 2D problems 
the use of 2x2 Gauss quadrature is seen to be adequate. This implies 
the need for 2x2x2 integration procedures for 3D analysis. In the vec- 
torized program, the Gauss integration procedure needs to store the 
derivatives of the shape functions at the nodes for all elements at 
each Gauss point. For 3D problems this results in prohibitive storage 
requirements. Thus the development of accurate one point reduced Gauss 
integration will be necessary before the Petrov-Galerkin algorithm can 
be extended for 3D viscous flows. 



Chapter 6 

COMPUTATIONS FOR 3D INVISCID FLOWS 


The use of vectorization strategies in simulating 2D compressible 
flow situations showed significant computational benefits. Vectoriza- 
tion is highly desirable for 2D flows but is essential for detailing 3D 
inviscid and viscous flows. To gain a further understanding of the 
role of vectorization in 3D, the strategies developed in Chap. 3 were 
used to implement the Taylor-Galerkin algorithm for three dimensional 
inviscid flow. Model generation procedures, display of results, 
details of the finite element formulation, and quality of solutions 
obtained are discussed in this chapter. 

6.1 Model Generation and Results Display 

The commercially available PATRAN program [19] is used extensively 
for finite element structural analysis. Modelling features of PATRAN 
can be exploited for fluid flow analysis. Modelling compressible fluid 
flow with PATRAN begins with the creation of a geometric representation 
of the computational domain. A 3D computational domain is shown in 
Fig. 6.1 and is represented by solid regions called hyperpatches. 
Finite element meshes are created by subdividing the hyperpatches into 
hexahedron or tetrahedron elements. The use of these elements depends 
on the capability of the analysis program. In the problems that 
follow hexahedron elements are used exclusively. 
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Translator and inverse translator programs [19] have been devel- 
oped at NASA Langley to interface PATRAN with flow analysis programs. 
The translator program uses output files generated by PATRAN to produce 
input data for a finite element analysis program. The results obtained 
from the analysis program are converted into the- format required by 
PATRAN for results display by the inverse translator program. 

PATRAN has extensive capability to display scalar solution results 
with contour plots. Contours may be curved lines which connect points 
of constant value or solid, color-filled bands which define the limits 
of certain ranges of the quantity being displayed. Regions of interest 
in the model (e.g. surfaces, symmetry planes, outflow planes, etc.) are 
identified and elements in these regions are grouped together in 
"active sets." Active sets are smaller than full models, containing a 
few hundred elements and can be used effectively to display the salient 
characteristics of the entire flow. Figure 6.2 illustrates the concept 
of an active set and results display for a typical active set. 

6.2 Taylor-Galerkin Algorithm 

The Taylor-Galerkin algorithm was introduced in Chap. 2 and the 
extension of this formulation to three dimensions is straightforward, 
but key equations are repeated here for reference. 

The 3D Euler equations in conservation form are given by, 

u -t + F i,i ■ 0 


( 6 . 1 ) 
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where U is the vector of conservation variables, and Fj are the flux 
vectors of mass, momentum and energy in the coordinate directions. The 
vector of conservation variables, U, and the flux vectors Fj are 
gi ven by , 


/ 1 \ 


f° ^ 

U 1 


I 5li 

u 2 

> F,. = u^U + p i 

' 6 2i 

i . 

u 3 1 

6 3i 



Uj 


where p is the density, Uj the velocity components in the x, y, and z 
directions, and E^ is the total energy. The pressure p is defined 
as, 

p - (y - 1) p CE t - 0.5(u i u i )] (6.3) 

Equation (6.1) is solved subject to proper initial and boundary condi- 
tions. The Taylor-Galerkin formulation is easier to derive by consid- 
ering just one variable. For a typical variable u Eq. (6.1) can be 
written as, 

u, t + Fjj = 0 (6.4) 

The computation proceeds through two time levels t n +i /2 and tp+j. 
At time level t n+ j/ 2 , values for u are constant within an element 
while at time t n+ j, these constant element values are used to compute 
nodal values of u. 

Time level t n+ i/ 2 : 

The constant element value is computed from, 

V u d +1/2 = f [N] dV {u} " " T- f CN ’i ] dV {F i }R (6 ’ 5) 
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where V denotes an element volume. At is the time-step, and [N] is the 
matrix of element interpolation functions. On the outflow surfaces the 
weighted residual equation is written as, 

A u 0+1/2 = J [N S D dA {u) n - f [N $ ] dA {F.^ .} n (6.6) 

A - A 

In the above, [N s ] denotes the interpolation functions of the 
gradients of the flux components on the outflow surfaces, and A is the 
outflow surface area. The element and surface quantities at t n+ i /2 
are used to obtain the nodal values at time t n+ i« 

Time level t n+ i: 

An approximation to the Taylor series expansion of u at t n+ j and 
the application of the weighted residual statement on the resulting 
equation yields, 

[M]{ u} n+1 = [M]{u} n + At J [N m 3 dV E? +1/2 + (R} n+1/2 (6.7) 

where CM] is the element consistent mass matrix given by, 

[M] = / [N] T [N] dV (6.8) 

J V 

and the load vector { R} is given by, 

{R} 0+1/2 = -At \ Z . E°? 1/2 [N] dA (6.9) 

J A i si 

where are the components of the unit normal surface vector n. The 
flux components on the surface, E°| 1/2 ane obtained using the surface 
quantities, u 0+1/2 , computed at the half step. 
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To handle flows with sharp gradients such as shocks, artificial 
dissipation is added at the end of each timestep. The 3D formulation 
uses Lapidus dissipation Eq. (2.30), and the addition of dissipation is 
of the form, 

u n+l _ u n+l + v At ^ . (6.10) 

where, 

o . 

E. = h u . . u, . (i not summed) (6.11) ' 

I I j I I I 

where v is the Lapidus coefficient, At the timestep, and h a character- 
istic element length. 

The element integrals that appear in Eqs. (6.5M6.9) were 
evaluated using closed form integration. The use of numerical integra- 
tion in three dimensions would result in inflated storage requirements 
and increased computational expense. In [31] the CPU time required for 
closed form solution of the mass matrix, Eq. (6.8), is compared to the 
CPU time require for computing the mass matrix with different orders of 
Gauss quadrature. Significant savings in CPU times are indicated for 
the closed form integration, and this is of considerable importance in 
the development of efficient 3D compressible flow codes. 

Of the three artificial dissipation models detailed in Chap. 2, 
the Lapidus dissipation model requires the least computations and mini- 
mal storage. The Lapidus dissipation Eq. (6.10) being highly nonlinear 
is evaluated using numerical integration. The order of the terms in 
Eq. (6.10) indicates the need for second order integration (in 3D, 8 
integration points), but preliminary numerical experiments indicate 
that one-point Gauss quadrature appears adequate for most problems. 
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6.3 Computational Speed and Storage 

The 30 vectorized finite element program was used to analyze flow- 
fields for problems ranging from a small number of nodes (a few thou- 
sands) to these needing. -large numbers of nodes (about 36,000). Exper- 
ience gained from these problems indicate the speed of computations for 
this program to be 6 x 10 "^ CPU seconds per time-step per node. This 

figure is competitive with existing finite difference codes. Another 

important concern is the storage reauired for the finite element 
program to run realistic problems. The VPS-32 has 32 million full 
precision (64 bits) words of central memory which translates to over 
500 large pages (65,536 words make up a large page) of memory. The 
finite element program was written as an "in-core" program, which 
imposes a ceiling on the size of the model that can be used for flow 
simulation. The storage needed for a problem can be estimated from the 
relation. 


MT0T = 40 x NPOIN + 270 x NELEM 

where MTOT is the total storage required. NPOIN and NELEM are the num- 
ber of nodes and elements, respectively. In terms of large pages 
MTOT/65,536 has the upper limit of 448, beyond which "page faulting" 
degrades the performance of the program. Thus, the maximum problem 
size that the finite element program can process is about 120,000 
nodes. If larger problems are encountered major modifications to the 
program are required. A possible strategy would be to restructure the 
data in such a way that paging, even when it occurs, is handled 
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efficiently. Programming strategies such as "packing" and "unpacking" 
memory, recalculation instead of storage, and use of efficient input- 
output requests can also help in handling very large problems. 

6.4 3D Sample Problems 

The capability of the computational procedure to. accurately calcu- 
late internal and external flowfields containing expansions, shock 
waves, and shear regions is demonstrated by two sample problems. Com- 
parison solutions obtained by other methods are also presented to gauge 
solution accuracy and shock resolution. 

6.4.1 Square Nozzle 

The first problem presented is the flowfield in an expansion- 
recompression square nozzle (Fig. 6.3). The problem is reduced to 
one -fourth its size using symmetry with the flow region being bounded 
by two planes of symmetry, an upper wall and a side wall. The flow 
field is characterized by expansion waves emanating from the upper and 
the side walls in the region 0<x<5. In the region 5<x<10, the flow is 
recompressed resulting in shock waves emanating from the upper and side 
walls. The shock waves intersect at x=17 and reflect from the symmetry 
planes. 

The inlet Mach number is 2.94 and a finite element mesh consisting 
of 7865 nodes and 6400 elements was used to march the solutions to 
steady -state. Convergence was indicated by a decrease in the L2 norm 
of the density changes by three orders of magnitude. The computational 
procedure used a global time-step that satisfied the CFL criterion. 
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Fig. 6.3 Expansion-recompression square nozzle. 
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Figure 6.4 shows the pressure contours on the symmetry plane of 
the nozzle at z=0. The presence of the expansion waves at the inlet 
section and the formation of the shock wave and its subsequent 
reflection from the symmetry plane at x=17 is apparent in the figure. 
Figures 6-5 and 6.6 compare the finite element solution with the 
solutions obtained from a reference plane finite difference procedure 
[50]. Figure 6.5 shows the axial variation of pressure at the 
intersection of the planes of symmetry (y=z=0). The sharp increase in 
pressure at x=17 occurs due to the intersection of four shock waves 
emanating from the walls of the nozzle. The variation of pressure 

along the corner formed by the intersection of the upper wall and the 
side wall is shown in Fig. 6.6. These figures indicate that the flow 
features are accurately predicted by the finite element solution. 

6.4.2 Scramjet Exhaust Flow 

The second problem is the flow field associated with an outboard 
module of a hypersonic research aircraft. A hypersonic vehicle and 
typical flow features downstream of the nozzle exhaust are shown in 
Fig. 6.7. The external flow, both below and beside the nozzle, inter- 
acts with the nozzle outflow, and the complete geometric configuration 
to be modelled is detailed in Fig. 6.8. 

To analyze the three dimensional shear flow, the problem was split 
up into three subproblems: (1) a 3D divergent nozzle, Fig 6.9a, (2) a 

2D expansion over the vehicle body. Fig 6.9b, and (3) the 3D region 
downstream of the nozzle exit. Fig 6.9c. 

The 3D flow field in the divergent nozzle was modelled with 8721 
nodes and 7168 elements. The inlet Mach number is 1.657 and the 
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Fig. 6.4 Pressure contours at the symmetry plane (z=0) of nozzle. 
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Fig. 6.5 Comparative pressure distributions along intersection of 
symmetry planes of square nozzle. 
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Fig. 6.6 Comparative pressure distributions along upperwall /si dewall 
corner of square nozzle. 
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Fig. 6.7 Hypersonic research vehicle and flow features downstream of 
engine exhaust. 
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Fig. 6.8 Geometric configuration of an engine outboard module. 




NOZZLE INLET ^ 
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(a) 3D divergent nozzle 
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Fig. 6.9 Subproblems for scramjet exhaust flow. 
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specific heat ratio is 1.27. Figure 6.10 shows the predicted pressure 
contours on a layer of elements on the symmetry plane of the nozzle. 
Flow enters parallel to the x axis, expands in the inlet section of the 
nozzle, and then a further expansion occurs downstream due to the 
downward turn of the nozzle wails. The second subproblem is a 
Prandtl -Meyer expansion with an inclined outflow plane. A 2D model 
consisting of 2057 nodes and 1920 quad elements was used to analyze 
this flowfield. The exit solutions from these two problems form the 
inlet condition for the third subproblem. This problem, the 3D shear 
region downstream of the nozzle, is modelled with a finite element mesh 
of 11,781 nodes and 10,240 elements. 

The flowfield that results in the 3D shear region includes expan- 
sions, shocks and contact surfaces. The exhaust from the nozzle inter- 
acts with the freestream below the nozzle, producing an expansion 
region, a plume shock, and an interface or contact discontinuity. The 
flow field caused by the exhaust of the nozzle and the expansion over 
the vehicle body near the nozzle sidewall also results in a sidewall 
shock, interface and expansion waves. The intersection of these two 
families of expansions results in a complex three dimensional flow- 
field. 

Figure 6.11 presents the predicted pressure contours on the 
outflow of the 3D shear region. The flow field on the left and the 
bottom of the outflow are seen to be relatively undisturbed. The 2D 
simple expansion waves are seen on the left by the horizontal contours 
H-N, and the lower region without contours remains undisturbed at free 
stream values. The presence of the shock envelope that comes off the 
bottom and the side walls of the nozzle is indicated by the closely 
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spaced contours while the complex expansion wave interactions appear in 
the upper right regions of the figure. The top right corner of the 
flow field is the region outside the intersection of the expansions and 
remains at freestream pressure. 

The interaction between the nozzle outflow and the freestream is 
further illustrated by pressure contours on a layer of elements in the 
symmetry plane as shown--in Fig.. 6.12. The expansion waves generated 
and the shock that results (see Fig. 6.7) indicate the qualitative 
accuracy of the solution procedure. 

The finite element results are compared with predictions from the 
GIM [44] program in Figs. 6.13 and 6.14. GIM (General Interpolants 
Method) combines features of the finite element and finite difference 
methods. Figure 6.13 shows the pressure distribution at the outflow 
plane of the shear region along a vertical line in the symmetry plane. 
The expansion region and the shock waves are displayed by both 
methods. The pressure distribution predicted by the finite element 
approach appears to be more realistic than the GIM results as indi- 
cated, for example, by the prediction of the sharper shock. Post-shock 
oscillations appears in the finite element results while the GIM 
results are seen to be overly smooth especially near the shock. 

A further comparison of the two methods appears in Fig 6.14 where 
the pressure distribution at the outflow plane is plotted normal to the 
symmetry plane. The finite element scheme predicts the expansion near 
the symmetry plane and the weaker shock that results from the sidewall 
nozzle exhaust/freestream expansion. In contrast, the GIM results show 
an almost continuous expansion to freestream pressure values. 
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Fig. 6.12 Pressure contours on a layer of elements 
the 30 shear region. 
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Fig. 6.13 Pressure distribution along vertical line in outflow plane of 
3D shear region. 



Fig. 6.14 Pressure distributions along horizontal line in the outflow 
plane of 30 shear region. 
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6.5 Closing Comments on the 3D Formulation 

This chapter demonstrates the capability of the Taylor-Galerkin 
algorithm to simulate complicated inviscid flow for 3D problems. The 
computing speed of the program is competitive and indicates good 
possibilities for extension of the formulation to 3D viscous flows. 
The addition of the viscous terms appears to be straightforward and 
will be the subject of future research efforts. 



Chapter 7 
CONCLUDING REMARKS 

7.1 Conclusions 

The development of strategies for effective vectorization of 
finite element compressible flow programs is described in this study. 
The vectorization procedures are tailored to exploit the hardware and 
software characteristics of the NASA Langley VPS-32. The use of these 
strategies for 2D and 3D inviscid and viscous flow computations is 
demonstrated. 

The basic principles of shock capturing methods are described. 
Two shock capturing finite element algorithms, the Taylor-Galerkin and 
the Petrov -Gal erkin, are described. The Taylor-Galerkin algorithm uses 
explicit artificial dissipation, the Petrov-Galerkin algorithm is based 
on streamline upwind methodology. The use of three explicit dissipa- 
tion models for the Taylor-Galerkin algorithm is described. Results 
obtained using Lapidus, MacCormack-Bal dwin, and Jameson dissipation 
models are compared. 

The Taylor-Galerkin algorithm with MacCormack-Bal dwin dissipation 
and the Petrov-Galerkin algorithm are used to solve a variety of 2D 
inviscid flow problems. Comparisons for the two algorithms are made 
using criteria such as solution quality, shock resolution, computa- 
tional speed and storage requirements. Results obtained show good 
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shock capturing properties for both methods. The Taylor-Galerkin 

algorithm exhibits local spurious oscillations at compression and 
expansion corners. The Petrov-Galerkin formulation shows minimal 
oscillations and good shock resolution. Results for the hypersonic 
flow ever a blunt leading edge using the Petrov-Galerkin algorithm 
indicates the need to implement an "integration by parts" procedure to 
ensure flux conservation. 

The computational speeds for the formulations indicate the 
Taylor-Galerkin algorithm to be about' three times faster than the 
Petrov-Galerkin algorithm. Local time-stepping procedures implemented 
on both formulations show the capability of the Taylor-Galerkin 

algorithm to be run at higher Courant numbers, around 0.6, compared to 
about 0.3 for the Petrov-Galerkin algorithm. A comparison of the 
storage needed for the 2D programs indicates that Petrov-Galerkin 

algorithm needs only 90 % of the storage needed by the Taylor-Galerkin 
algorithm. The size of storage needed may not be critical for 2D prob- 
lems but becomes significant for 3D computations. 

The vectorization strategies developed for the 2D inviscid 
Petrov-Galerkin algorithm is extended to include the effects of vis- 
cosity and heat conduction.. The 2D viscous Petrov-Galerkin algorithm 
is validated by simulating the supersonic flow over a flat plate. 

Results obtained from the Petrov-Galerkin algorithm are compared with 
results from a finite difference method and a 2D viscous Taylor- 
Galerkin algorithm. The comparison of computational speeds of the two 
viscous algorithms indicates the Taylor-Galerkin algorithm to be twice 
as fast as the Petrov-Galerkin algorithm. The slower speeds for the 
Petrov-Galerkin for both the inviscid and viscous algorithms are due to 
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the need for numerical integral evaluations. Comparisons of storage 
needed for the two viscous algorithms shows that the Petrov -Gal eric in 
algorithm needs less than 50% of the storage needed by the Taylor- 
Galerkin algorithm 

The extension of the vectorization strategies for 3D in viscid 
Taylor-Galerkin computations is described. Two sample problems are 
shown to demonstrate the capability of the finite element procedure to 
model flow details such as shocks, expansions and shear layers for com- 
plicated three dimensional compressible flows. Results obtained using 
the finite element procedure are compared with results from the refer- 
ence plane finite difference method and the General Interpol ants 
Method, GIM. The computational speeds obtained with the 3D vectorized 
Taylor-Galerkin procedure is seen to be competitive with existing 
finite difference codes. 

The development of an efficient vectorization procedure for 
general explicit finite element flow algorithms and applications of 
this procedure to 2D and 3D inviscid and viscous flows are demon- 
strated. The computational rates obtained for both finite element 
algorithms indicate the ability of the procedure to handle complex 
three dimensional compressible flows requiring nodes that number in the 
hundreds of thousands. Much research remains to be done to be able to 
accurately predict aerothermal -structural interactions for high speed 
vehicles, but a computational procedure has been developed in this 
study that will play a major role in achieving this goal. 
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7.2 Recommendations for Further Research 

The finite element methods described in this study possess good 
shock capturing properties. Shock resolution can be enhanced further 
by local mesh refinements. In most flow situations the exact location 
of shocks or other gradients is not known a-priori. This underlines 
the need to develop adaptive mesh refinement procedures. Refinement 
procedures can be developed independent of the finite element algorithm 
and can be interfaced with either formulation. 

The analysis of high speed viscous flow requires very refined mesh 
spacings close to the surface of the body to capture details of the 
boundary layer. The computational expense for viscous analyses are 
usually more expensive than inviscid analyses. A good procedure for 
viscous analyses is to start off with an inviscid analysis of the flow 
region. The results obtained from the inviscid analysis contain 
details on features such as shocks and expansions. The development of 
programs to accurately interpolate the mesh information from inviscid 
to viscous grids is very desirable since it results in better initial 
conditions and faster rates of convergence for the viscous problem. 

The Petrov-Galerkin formulation has demonstrated properties for 
good shock resolution, solution accuracy, fast convergence and lesser 
storage requirements. The main drawback of this method is the slower 
computational speeds for both inviscid and viscous formulations. The 
development of accurate one point integration procedures will improve 
processing rates and result in a finite element formulation with 
excellent computational properties. 
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The development of accurate one point integration schemes will 
also help in improving the processing rates of the Tayl or-Galerkin 
algorithm. The timing data for the Taylor-Galerkin formulation indi- 
cates that over 50% of processing time is spent in the numerical inte- 
gration needed for addition of artificial viscosity. Use of accurate 
one-point integration schemes could double the processing rates for the 
Taylor-Galerkin algorithm. 

Viscous flow problems require very small grids spacings at the 
walls and these spatial dimensions may necessitate implicit treatment 
of the elements that lie inside the boundary layer. Vectorization pro- 
cedure may have to be developed for mixed implicit-explicit solution 
procedures. 

The current interest in hypersonic flows in the range of Mach 10 
to Mach 25 indicates the need to include real gas effects in the solu- 
tion procedure. The need to develop vectorization strategies for these 
effects is essential to solve realistic hypersonic viscous flow 
problems. 
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APPENDIX A 


COMPUTATION OF NODAL SECOND DERIVATIVES 


The dissipation models of MacCormack-Baldwin, Eq. (2.34) and 
Jameson, Eq. (2.40) need second and third derivatives of nodal 
quantities such as pressure and the conservation variables. The second 
derivatives at the nodes of an element can be obtained variational ly 
from the first derivative defined within that element using Greens 
formula. For example, the second derivatives in the x direction can 
be written as. 


Let 


U ’xx “ ^ U, x^x 


L (u ) = u,, 


The adjoint operator for L(u) is given by, 

L*(v) = -v, x 

By Greens formula, 


f [v L(u) -u L*(v)] dA = f [Qi 
J A JD 


- Pj] • n ds 


(A. 1 ) 


(A. 2) 
(A. 3) 


(A. 4) 


where A is the region enclosed by D, the boundary of the region, n is 
the outward normal to the boundary D. P and 0 are evaluated on the 
boundary and depend on the differential equation. The values of P and 
Q are given by, 


P = 0 

Q = uv (A. 5) 
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Substituting the above in Eq. (A. 4) results in. 


[ [v L (u ) - u L*{v)] dA =/ uv (i*n) ds 
J A. J A 


(A. 6) 


The boundary term is assumed to vanish for outflow surfaces and 
Eq. (A. 6) reduces to, 



(A. 7) 


where u and v are functions of x and are well behaved. Let u and v be 
defined as. 


v = N 

u = U> x (A. 8) 

where N is a matrix of element interpolation functions and U is a nodal 
variable. Substituting the values of u and v in Eq. (A. 7) results in, 

/ N U, YY dA « -/ U, Y N, y dA (A. 9) 

j A xx Ja * x 

Assume an interpolation of the second derivatives at the nodes given 
by. 


U ’xx = N {U ’xx } (A. 10) 
and the first derivatives is evaluated inside the element, typically at 
a Gauss point. The resulting equation is, 


L 


{ N} [N] dA {U 


■xx> ■ -/». 


X " U ’X 


(A. 11) 


which can be written following Eq. (2.20) as, 

CM] tU, xx J = -f H. x dA U, x 


(A. 12) 


An alternate procedure to obtain the nodal second derivatives is 
to assume a linear variation of the first derivatives at the nodes, 
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U, x = N {U, x > (A. 13) 

The second derivatives at the nodes can then be written as, 

W(U, x x)={n. x »«(#, x ) (A. 14) 

The procedure adopted in this dissertation is to compute the 
second derivatives of pressure as given by Eq. (A. 12) but to compute 
the second and third derivatives of the conservation variables along 
the lines of Eq. (A. 14). 


APPENDIX B 


DEFINITION OF JACOBIAN MATRICES FOR COMPRESSIBLE FLOW EQUATIONS 
The following matrices are the Jacobians of the Euler fluxes with respect to the 

0 0 \ 

-(7- l)u, (7-1) 

0 0 

tt t 0 

-(7 - l)tt,o, Ttt! 


a*i * 

«, 1 * «t ((7 - 1 )«* - 7*) 

«« * 7« - (7 - l)(tt? + tt**/2) 
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1 P 

0 \ 



-ttitt. 

tt. 

«, 0 
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<hi 

-(7 - l)tt, 

-(7 - 3 )u, -(7- l)u. 

( 7 - 1 ) 



“ttjtij 
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«, 0, 

0 



«S» 

-(7- l)ttiO, 

a„ -{7 - l)«,tt. 

7 **, 
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« 5 t = 

((7 - l)tt* - 7e) 
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0 1 
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-ttitt. 

«, 
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tt, tt. 

0 



<*« 

-(7 - l)o, 

— (7 — 1 )**, -( 7 - 3 ) 0 , 

( 7 - 1 ) 



(1st 

-(7- l)tt|tt. 

-<7 - !)«,«, 

Ttt, 


V J 


conservation variables: 


0 

1 

0 

0,1 

-(7 - 3)«, 

-(7- !)«, 

-tt,o. 

«, 

«i 


«*, 

0 

0,1 

<*»» 

-(7 - !)«»«* 
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«« = 

<*S1 = Uj((7 - l)u s - 7«) 

<*S4 ~ 7* ~ (7 ~ 1)(“J + «*/2) 

All the formulas now refer to V. The following combinations of variables are 
introduced to simplify subsequent writing: 


7 = 7-1. 

*» = *?- 27 *, + 7 , 

t _K’ + V? + I? 
*' “ 2 r 5 ’ 

^4 ~ ^ — 7f 

t, = *1 - 7 

fc* = *? - 7(*i + *») 

d = 7Vi-V?, 


d, = -W, 


s,=W 


c, = 7 Vs-V*, 


d, = -V,K, 


e, = K,F s 


c, = 7 V, - V 4 ’, 


dj = -W, 


s, = F 4 K, 


The Euler fluxes may be written as: 





f e. > 


f * 1 
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f e, \ 

'■ (V) = 7, 

c» 

d, 

d. 

. W) = |? 
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ll 
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d, 

d, 

c* 








The matrix Aq and its inverse are given by 



f-v? 


e* 
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n* ) 


The Jacobians of the Euler fluxes are: 
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APPENDIX C 


PETROV-GALERKIN OPERATORS FOR ADVECTION EQUATION 


Chapter 2 described the Petrov-Galerkin formulation which uses the 
streamline, discontinuity capturing, and reduced discontinuity captur- 
ing operators to detail flow discontinuities. The need for these three 
operators can be illustrated by application of the Petrov-Galerkin 
formulation to model equations, such as the advection equation. The 
advection of u can be written as, 


u, t + a T vu =0 (C.l) 

where u = u(x,t) and a is the characteristic vector. The simple Galer- 
kin formulation results in the weighted residual equation given by. 




Vu) dA = 0 


(C.2) 


where W = N, the shape functions. For the finite element procedure the 
use of the Galerkin procedure is equivalent to a central differencing 
of the spatial derivatives which results in oscillatory solutions. 
Diffusion needs to be added to the scheme to suppress these oscilla- 
tions. The direction of the added diffusion can be obtained from the 
characteristic vector 7 which defines the direction of propagation of 
information. A way to add this diffusion is to modify the interpola- 
tion functions W by adding a discontinuous function p to N. The 
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function p can be written as. 


p = t a T 7N (C.3) 

where x depends on the characteristic vector a. The use of the charac- 

m* 

teristic vector (and the use of the characteristic matrices Aj for 
the compressible flow equations) injects into the finite element 
formulation the eigenvalue/eigenvector information of the hyperbolic 
system. The addition of p to the weighting functions can be shown to 
result in the addition of artificial dissipation terms to the advection 
equations. To illustrate this, consider the weighted residual 
statement of Eq. (C.2). With the definition of W as, 

W = N + p (C.4) 

Eq. (C.2) can be written as. 


/. 


(N + p) [u,, 


~T 
+ a 


7u] dA = 0 


(C. 5) 


or 




+ a T vu) 


dA 




dA 


(C.6) 


Using the discontinuous function p for the spatial derivatives only, 
Eq. (C.6) can be written as. 


J (u, t +*a T vu) dA - f p T “a T vu dA 
A A 


(C.7) 


or 


J N T (u, t +"a T Vu) dA = J (xH T 7N) T a T Vu dA (C.8) 


Using integration by parts, Eq. (C.8) can be written as, 




dA 



N T 7(axa T 7u) dA 


(C. 9) 
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which is the weighted residual statement of Eq. (C.l) with added 
artificial dissipation. 

The addition of the streamline diffusion terms enhances the 
ability of the Petrov-Galerkin formulation to capture discontinuities 
but the oscillations at these discontinuities are not completely elimi- 
nated. Additional diffusion needs to be added at the shocks. The 

direction of the streamlines is along a while the direction of the dis- 
continuity is grad(u) and these directions are shown in Fig. C.l. To 
capture discontinuities better, diffusion needs to be added normal to 
the discontinuity or along grad(u). 

To get the projection of "a onto the direction normal to the dis- 
continuity, a can be split into components in directions normal and 
parallel to the discontinuity as shown in Fig. C.l. 

a * a B + a^ (C.10) 

By definition, a satisfies the relation 

"ajj" vu = a"*" vu (C.ll) 

The addition of diffusion normal to the discontinuity is enabled 
using an which contains this directional information. The addition 
of the required dissipation can be accomplished by modifying the 
weighting function further to include the "discontinuity capturing" 
term. W is modified as 

W = N + p + q (C. 12) 

where q.can be written as, 

q = t, a[ vN (C. 13) 

The addition of q to the weighting function adds diffusive terms on the 




155 


right hand side similar to the streamline term and this is given by. 


L 


N T (u, t + a T Vu ) dA 


-J U J 

J A 


v(a ra T vu) dA + / N T v(a t a T vu) dA 

A (C. 14) 


With the contributions of p and q to the weighting function W, too 
much diffusion is added normal to the discontinuity. The diffusion 
needed at the discontinuity can be explicitly controlled by q which 
implies that the component of the streamline operator along the direc- 
tion of grad(u) is redundant and can be subtracted out. This avoids 
the occurrence of overly smoothed discontinuities. 

The use of streamline, discontinuity capturing and reduced 
discontinuity capturing terms in the weighting function serves to add 
artificial diffusion in a manner similar to the explicit viscosity 
schemes. The use of the characteristic vector and its projection in 
the direction of. the discontinuity provide control of the direction of 
the added diffusion. The directional addition of diffusion serves to 
provide accurate details of flow discontinuities. 



